Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stats over HTTP API #817

Closed
grobie opened this issue Mar 26, 2015 · 15 comments
Closed

Stats over HTTP API #817

grobie opened this issue Mar 26, 2015 · 15 comments
Labels
theme/operator-usability Replaces UX. Anything related to making things easier for the practitioner type/enhancement Proposed improvement or new feature

Comments

@grobie
Copy link

grobie commented Mar 26, 2015

Stats and runtime information are currently available via the stats RPC and the consul info cli command. It would make it easier for clients already using the HTTP API for catalog, k/v, etc. access to also fetch the stats via the HTTP API. That reduces the configuration interface and should also keep client code bases simpler as only one client object is needed. I'm happy to implement that endpoint, just wanted to check if this got discussed before or if there are arguments against such endpoint.

Just testing waters here, what are the opinions people have on the exposed format? I'm totally fine implementing a general purpose JSON representation. Though, in the end I want to retrieve the stats to expose them to be scrapped by Prometheus. If there are no strong opinions about the client format, can I throw their client format into the ring? That would allow people to scrape consul with prometheus without any further setup needed.

Let me know what you think :)

@juliusv
Copy link

juliusv commented Mar 26, 2015

Besides the somewhat formal spec doc, this is a good example from cadvisor what the text metrics format actually looks like:

https://github.com/google/cadvisor/blob/master/metrics/testdata/prometheus_metrics

The other alternative is the protobuf format.

But using the Go client library (like cadvisor and etcd do), it does all this automatically anyways.

@discordianfish
Copy link
Contributor

👍

Given that consul uses go-metrics and there is prometheus support in there already, it should be pretty easy to implement this. I started working on a PR, although that one creates a new HTTP listener and isn't ready because of metrics names which aren't valid prometheus metric names. I agree that this should rather be part of the main http api, so I'm all in favor of doing this!

@juliusv
Copy link

juliusv commented Mar 26, 2015

Ok, if this is done via go-metrics and not the native Prometheus client library, there's currently some drawbacks to that: go-metrics only supports flat metrics (a list of components per metric, that e.g. for statsd output will be joined into a dot-separated metric name) with no explicit naming of dimensions. There's plans to allow mapping those dotted components into more useful dimensional Prometheus metrics though: hashicorp/go-metrics#9

However, the best option is usually to use the Prometheus client library natively, as that gives the best access to the data model and the efficiency of the client library's implementation.

@armon
Copy link
Member

armon commented Mar 26, 2015

I think we would expose the stats in the same map style format that the RPC layer already uses to maximize code re-use. I think a simple JSON encoding of it would work fine. I think the API is more about creating a "pull" mechanism to fetch the stats. go-metrics is designed around the "push" model of streaming the metrics to an external system.

@juliusv
Copy link

juliusv commented Mar 26, 2015

@armon Yup, Prometheus is about "pull" as well, so it's a bit the odd one out in go-metrics. But if you want to keep it to a JSON format, then you'll need custom code, yeah. The question would then be whether it'd also be possible to expose it as Prometheus metrics on some other endpoint (or via content-type negotiation on the same endpoint).

@bogdanov1609
Copy link
Contributor

👍

@discordianfish
Copy link
Contributor

@armon Did you think about this any more? It came up here again and I would like to solve that somehow. You said that in the rpc layer you access the metrics already but it seems like the metrics there are difference and the only way (I know of) to get metrics state out of go-metrics is via the prometheus sink or a new sink. So I'm happy to implemented that, but it's a bit unclear to me how it's suppose to work.
Right now the simplest option seems to use the Prometheus sink. If that works for you, I can submit a new PR for that. It still has the problem with only support for flat metrics but everything else would lead to redundant metric code or require heavy refactoring of go-metrics.

@discordianfish
Copy link
Contributor

@armon Here is a new branch: master...discordianfish:fish/add-prometheus-sink
I've decided against using the existing api listener because people usually bind that to 127.0.0.1 where the metric listener needs to be reachable from other hosts.
This also depends on hashicorp/go-metrics#30 or consul might crash if service tags are used.

Even though it already helps me a lot to debug our ongoing consul issues and I'm not very happy with the resulting metrics for two reasons:

a) They are not named very well. Even if you don't want to adopt the prometheus format, maybe you are up to adhere to some prometheus best pratices that are IMO universal: https://prometheus.io/docs/practices/naming/

b) They are flat. I'll submit another PR to go-metrics to make the resulting metrics a bit 'better', but ultimately if the interface of go-metrics only supports flat metrics we won't be able to solve this properly. What are you thoughts about that? Have you considered refactoring the metrics to support multidimensional ones? The Datadog support also looks a bit hackish in that regard: https://github.com/armon/go-metrics/blob/master/datadog/dogstatsd.go#L60

@discordianfish
Copy link
Contributor

I'm now using my fork in our infra and came to the conclusion that it doesn't make sense to just expose the existing metrics for prometheus because of the flat metrics. Doing any 'hack' like in my fork won't yield good metrics, so right now using the prometheus statsd exporter is the best, yet problematic solution to get proper, multidimensional metrics.
There are a few options to really solve this problem:

  1. Refactor go-metrics/build something to support both flat (statsd/graphite) and multidimensional metrics (prometheus, influxdb, datadog)
    While this seems like the most convenient option, this is definitely a non-trivial task. Especially if you want to adhere to the naming conventions and best practices established in the communities around the various monitoring systems.

  2. Use a custom, consul specific exposition format as part of the api
    Every monitoring system would need some form of adapter. This separates the concern of the monitoring systems (domain specific best practices around naming etc) from the ones of consul.

  3. Use Prometheus client library
    As a Prometheus maintainer I'm obviously biased. But the client library is completely independent of the Prometheus server. It provides a easy to parse text based format as well as a fast and well defined protobuf format and suggests a bunch of general naming best practices which would also help people to write 'adapter' for other monitoring systems. A lot of tools in the infra ecosystem (etcd, kubernetes etc) already uses the client lib.

@blaubaer
Copy link

👍

@jeinwag
Copy link

jeinwag commented Sep 19, 2016

So considering the discussion in hashicorp/vault#1415, I guess using the Prometheus library is off the table. Any suggestions on how to move forward?

@grobie
Copy link
Author

grobie commented Oct 20, 2016

Vault is a lot more concerned about security than consul or other software, so I don't see how these arguments apply here. There are no known security issues with the Prometheus client library and its being used in a lot of projects already.

@ghost
Copy link

ghost commented Jan 31, 2017

Is there any plan to have this implemented?(telemetry exposed via CLI or API?) I'd love to be able to properly monitor consul with my existing monitoring solution without using statsd/python/etc

@evilezh
Copy link

evilezh commented Feb 3, 2017

👍

@guidoiaquinti
Copy link
Contributor

@kyhavlov any plan to re-evaluate this issue?

The Prometheus metrics format is now the defacto standard of metrics in the 'cloud native' world. I think implementing the prom client library could be a great addition to this project! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/operator-usability Replaces UX. Anything related to making things easier for the practitioner type/enhancement Proposed improvement or new feature
Projects
None yet
Development

No branches or pull requests

10 participants