Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request - Prometheus metrics support #318

Open
varun06 opened this issue Jan 2, 2018 · 11 comments
Open

Feature request - Prometheus metrics support #318

varun06 opened this issue Jan 2, 2018 · 11 comments

Comments

@varun06
Copy link

varun06 commented Jan 2, 2018

Now that #149 is closed. Can we decide the right approach and add prometheus metrics support to project.

@dmitryilyin
Copy link

There is an ongoing effort to make telegraf's input plugin for burrow API here influxdata/telegraf#3489

It can then be used to send metrics to Graphite, InfluxDB, Prometheus and others.

@varun06
Copy link
Author

varun06 commented Jan 17, 2018

@dmitryilyin does that mean "use telegarf and burrow both" or that effort can be used to add prometheus support to burrow itself?

@solsson
Copy link

solsson commented Jan 17, 2018

@dmitryilyin What advantages to you see in exporting via Telegraf?

@dmitryilyin
Copy link

dmitryilyin commented Jan 18, 2018

Yes, it means using them both. Adding Prometheus format metrics to burrow is indeed useful, but other people will (and already do) want Graphite output, others are writing InfluxDB connector, and there are many more monitoring systems around.

On the other hand, telegraf works as a swiss army knife. It has a lot of input plugins https://github.com/influxdata/telegraf/tree/master/plugins/inputs and can be easily extended by exec reporter scripts, so it can gather metrics and receive metrics form a lot of things, including gathering system metrics much better then Prometheus' node_exporter, which you should be using, right.
It can output metrics to a lot of things too,
including Prometheus and Graphite https://github.com/influxdata/telegraf/tree/master/plugins/outputs. Although different metric formats and styles can complicate things.

The Prometheus style is to have a lot of different exporters and/or integrate metrics gathering to applications and push gateway for scripts. Which approach is better? Who knows.

If you have only Prometheus and not going to integrate with anything else, then, perhaps, you don't need telegraf at all and can use burrow_exporter or integrate metrics into burrow itself, or maybe you can try telegraf instead if you do need to talk to many other systems.

Anyway, adding Prometheus metrics directly to burrow will be helpful. It will also allow to use telegraf's, Prometheus protocol supports on Burrow instead of using burrow's API. Will it be better remains to be seen.

@varun06
Copy link
Author

varun06 commented Jan 18, 2018

That make sense, but yeah adding prom support to burrow going to be helpful too.

@solsson
Copy link

solsson commented Jan 18, 2018

I think it should be noted that exporting to Prometheus doesn't come with the usual complexities of maintaining an integration. It's an HTTP endpoint, nothing else. Very much like the GET endpoints in the /v3 API, but with plaintext instead of JSON.

It'd be great if the discussion for how to map the current responses to Prometheus labels took place in this repo. It affects how useful the exported metrics are for consumer lag monitoring.

If you have only Prometheus and not going to integrate with anything else, then, perhaps, you don't need telegraf at all and can use burrow_exporter or integrate metrics into burrow itself, or maybe you can try telegraf instead if you do need to talk to many other systems.

Using burrow_exporter is ok, though it adds a delay (unless its polling is perfectly synced with Prometheus pull) and some overhead. It too needs a discussion on mapping to labels. Is anyone interested in helping out with jirwin/burrow_exporter#9, i.e. support for the current API version?

@solsson
Copy link

solsson commented Jan 19, 2018

This is the a sample metric I get out of burrow_exporter after my v3 search-and-replace:

# HELP kafka_burrow_topic_partition_offset The latest offset on a topic's partition as reported by burrow.
# TYPE kafka_burrow_topic_partition_offset gauge
kafka_burrow_topic_partition_offset{cluster="local",partition="12",topic="__consumer_offsets"} 2428

I think these labels make sense.

I had a quick look at the source to try to get the lag export working, but instead of spending time on the structs there... Could anyone hint on how to get hold of these data structures https://github.com/linkedin/Burrow/wiki/Templates#data-in-templates inside Burrow instead, whenever they change?

@solsson
Copy link

solsson commented Jan 19, 2018

An argument for an external exporter might be that it can do actual integrations without adding to Burrow complexity. For example it could look up owner IPs from partition info in the Kubernetes API, to tag metrics with an optional owner_pod_name.

I think the exporter is ok with v3 since jirwin/burrow_exporter#9 (comment). See sample export there. I think the labels are good, and they'll be forward compatible even if more labels are added later.

@Xaelias
Copy link

Xaelias commented Mar 14, 2019

One of the big drawbacks of an external integration like the burrow exporter linked here. Is that it has its own scrape interval. On top of prometheus scrape interval.
Like mentioned above, prometheus metrics are just a plaintext representation of what burrow has. Having that inside burrow shouldn't be a whole lot of complexity. I would rather also not have to rely on 2/3/... projects just to track kafka lags :-D

@Xaelias
Copy link

Xaelias commented Mar 22, 2019

Oh also the burrow exporter is actually bugged. It looks like the maintainer is not responsive (although they might respond later hopefully). And I just don't have the go expertise to fix the net/http code myself so...

@shamil
Copy link

shamil commented May 30, 2019

One of the big drawbacks of an external integration like the burrow exporter linked here. Is that it has its own scrape interval. On top of prometheus scrape interval.

This is fixed in my fork, mostly full refactor (except burrow client), I'm now using custom collector implementation, which means scrape happens on demand when /mertrics endpoint scraped by prometheus: https://github.com/shamil/burrow_exporter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants