Skip to content
This repository has been archived by the owner on Aug 26, 2020. It is now read-only.

partition level metrics #12

Merged
merged 4 commits into from
Jan 24, 2019
Merged

partition level metrics #12

merged 4 commits into from
Jan 24, 2019

Conversation

sundy-li
Copy link
Owner

No description provided.

@juhoautio
Copy link

Thanks! I tried this locally and I can see logsize & offsize written to influxdb split by a new partition dimension.

What's confusing though is that not every partition is reported for each timestamp. Have you thought about that?

@sundy-li
Copy link
Owner Author

Hi, the reason is that we cannot get all partition offsize by intervals, because kafka stores the offset in __consumer_offset topic, we just consume from this topic to get the offset change event.

@juhoautio
Copy link

juhoautio commented Dec 25, 2018

Thanks!

A use case that I’m still trying to implement with burrowx is creating this kind of charts:
https://github.com/quantifind/KafkaOffsetMonitor/blob/master/README.md#history-of-topic-position

KafkaOffsetMonitor only shows these on topic level. It would be nice to be able to do these on both topic & partition level, although topic level is not a must. I think this should be doable if the max offsets are pushed separately* (periodically, or at least when they change). Nice thing with grafana is that charts can automatically roll forward as new data arrives.

I’m not sure if it will be possible to create monitoring charts similar to this based on the data that burrowx currently stores in influxdb.

*) Have you considered pushing the max offsets of every topic & partition periodically? Even without visualizations, this would allow monitoring if some consumers are not committing anything to partitions that have new messages.

@sundy-li
Copy link
Owner Author

sundy-li commented Dec 25, 2018

@juhoautio

Yes, currently there are still some bugs to have this feature, but I am on marriage vacation now, so I may have no time to finish it.

You could try eagle and kafka-monitor, maybe I will consider a better way to have it in burrowx in the future, but with no promise.

@juhoautio
Copy link

Thanks for the suggestions;

  • First, I was browsing the docs of kafka-monitor, but couldn't find anything about consumer offset monitoring
  • Maybe Kafka Eagle might do the job..
  • However I'm looking for a simplistic solution that exports to influxdb, where burrowx is the only active project that I've come across so far

When you have some more time, please consider my following question on fluctuation of consumer offset & lag charts in grafana:

  • When I was trying to create a dashboard (before partition-level split), I wasn't only displaying the lag, but also the committed offset value
  • The committed offset chart (on topic level) was fluctuating instead of growing continually – which we can clearly deem to be wrong (as long as consumer offsets are not reset, which was not the case)
    • I suspect that it happens because not all partitions are included in each commit
    • This is a more evident demonstration of what goes wrong when not all partitions are always committed with the same timestamp, but I think the same problem applies to lag monitoring.
    • Have you considered this issue? I think everything would be fine as long as charts are also drawn on partition level granularity, but as soon as combined offset or lag for a topic or multiple topics is required, I don't know if there's any other way to properly handle it than making burrowx write the consumer offset & lag of every partition for each written point of time, including the partitions whose committed offset didn't change
  • In the DemoView consumer lag is presented across all topics and the separately for some topics, I presume.
    • Also a grafana.sample.json is provided.
    • Isn't that simply displaying the mean lag per each timestamp that exists?
    • Isn't it so that the displayed lag can fluctuate depending on how many partitions were included in each commit? If you agree, I'm a bit curious to know how you're using burrowx, what kind of use case it is if fluctuation is not a problem.

Until then, enjoy your vacation!

@sundy-li
Copy link
Owner Author

sundy-li commented Jan 24, 2019

@juhoautio Sorry for the delay.

I have updated this PR, seems it's ok to have topic consumer partition metrics now.

> select * from consumer_metrics where consumer_group = 'my_group' limit 10;
name: consumer_metrics
----------------------
time			cluster		consumer_group		lag	logsize		offsize		partition	topic
1548314472000000000	sz-aliyun	my_group	-4054	145567146498	145567150552	119		my_topic
1548314523000000000	sz-aliyun	my_group	58432	145528968852	145528910420	13		my_topic
1548314533000000000	sz-aliyun	my_group	1901	145587742728	145587740827	22		my_topic
1548314543000000000	sz-aliyun	my_group	933	145426831491	145426830558	20		my_topic
1548314850000000000	sz-aliyun	my_group	-3184	145576423495	145576426679	113		my_topic
1548314860000000000	sz-aliyun	my_group	-9086	145579608039	145579617125	107		my_topic
1548314870000000000	sz-aliyun	my_group	-2641	145435290501	145435293142	20		my_topic

May u have a look about this? Just checkout this branch, and have some tests.

@juhoautio
Copy link

@sundy-li thanks for the update. Could you describe a bit more what your latest change does?

Also, sorry that I forgot to mention it before, but we found a solution that's based on the KafkaOffsetMonitor that we have been using: https://github.com/quantifind/KafkaOffsetMonitor (dead project, but works). To get what we need we only needed to write this reporter plugin:
https://gist.github.com/juhoautio/326acff2c34cd45a32af0a375257ba22

  • This works quite well because the offsetInfoSeq: IndexedSeq[OffsetGetter.OffsetInfo] always includes all topics & partitions of a single consumer group even if there were new commits only to some partitions.
  • With this in place we're not actively looking into using burrowx any more. I hope my feedback on burrowx has been useful though.

@sundy-li
Copy link
Owner Author

I just use the OffsetFetchRequest API to fetch the consumer offsets regardless of consuming the topic __consumer_offsets

@sundy-li sundy-li merged commit f9594aa into master Jan 24, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants