Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kafka_consumer plugin can't get full data #2845

Closed
keyboardfann opened this issue May 23, 2017 · 9 comments
Closed

kafka_consumer plugin can't get full data #2845

keyboardfann opened this issue May 23, 2017 · 9 comments
Labels
area/kafka bug unexpected problem or unintended behavior

Comments

@keyboardfann
Copy link

keyboardfann commented May 23, 2017

Bug report

I generate 20000 metrics to kafka , but telegraf kafka-consumer plugin only receive 13344 metrics.
Seems the consumer don't attach and consume each partition.

Relevant telegraf.conf:

[global_tags]
[agent]
  interval = "60s"
  round_interval = true
  metric_batch_size = 2000
  metric_buffer_limit = 20000
  collection_jitter = "0s"
  flush_interval = "60s"
  flush_jitter = "0s"
  precision = ""
  debug = true
  quiet = false
  logfile = "/var/log/telegraf/telegraf.log"
  hostname = ""
  omit_hostname = true

[[outputs.graphite]]
  servers = ["a:2013","b:2013"]
  prefix = "stats"
  template = "host.tags.measurement.field"
  timeout = 2


[[inputs.kafka_consumer]]
  topics = ["test6"]
  zookeeper_peers = ["a:2181","b:2181","c:2181"]
  zookeeper_chroot = ""
  consumer_group = "telegraf_metrics_consumers"
  offset = "newest"
  data_format = "graphite"

System info:

Confluent Kafka 0.10.2
telegraf 1.3.0

Steps to reproduce:

  1. Push data to kafka
  2. Start telegraf to consumer message , output to graphite

Expected behavior:

Telegraf should receive 20000 metrics

Actual behavior:

Only receive 13344 metrics

Additional info:

kafka topic info on kafka-manager
image

Consumer Info on kafka-manager
image

telegraf.log , only receive 13344 metrics

2017-05-23T10:27:00Z D! Output [graphite] buffer fullness: 0 / 20000 metrics. 
2017-05-23T10:28:00Z D! Output [graphite] buffer fullness: 399 / 20000 metrics. 
2017-05-23T10:28:00Z D! Output [graphite] wrote batch of 399 metrics in 3.745281ms
2017-05-23T10:28:00Z D! Output [graphite] wrote batch of 2000 metrics in 10.369182ms
2017-05-23T10:29:00Z D! Output [graphite] wrote batch of 2000 metrics in 12.531463ms
2017-05-23T10:29:00Z D! Output [graphite] wrote batch of 2000 metrics in 16.626827ms
2017-05-23T10:29:00Z D! Output [graphite] buffer fullness: 202 / 20000 metrics. 
2017-05-23T10:29:00Z D! Output [graphite] wrote batch of 202 metrics in 1.224102ms
2017-05-23T10:30:00Z D! Output [graphite] buffer fullness: 1933 / 20000 metrics. 
2017-05-23T10:30:00Z D! Output [graphite] wrote batch of 1933 metrics in 18.08751ms
2017-05-23T10:31:00Z D! Output [graphite] buffer fullness: 1424 / 20000 metrics. 
2017-05-23T10:31:00Z D! Output [graphite] wrote batch of 1424 metrics in 10.366979ms
2017-05-23T10:32:00Z D! Output [graphite] wrote batch of 2000 metrics in 11.628594ms
2017-05-23T10:32:00Z D! Output [graphite] buffer fullness: 1376 / 20000 metrics. 
2017-05-23T10:32:00Z D! Output [graphite] wrote batch of 1376 metrics in 6.473589ms
2017-05-23T10:33:00Z D! Output [graphite] buffer fullness: 0 / 20000 metrics.
@keyboardfann
Copy link
Author

Seems the problem only happen partation >1

Topic 8 with 1 partation & 1 replication
image

Consumer Info
image

telegraf.log , get 20000 metrics

2017-05-23T10:55:00Z D! Output [graphite] buffer fullness: 600 / 20000 metrics. 
2017-05-23T10:55:00Z D! Output [graphite] wrote batch of 600 metrics in 5.246315ms
2017-05-23T10:55:00Z D! Output [graphite] wrote batch of 2000 metrics in 11.651002ms
2017-05-23T10:55:00Z D! Output [graphite] wrote batch of 2000 metrics in 48.714389ms
2017-05-23T10:56:00Z D! Output [graphite] wrote batch of 2000 metrics in 8.952067ms
2017-05-23T10:56:00Z D! Output [graphite] wrote batch of 2000 metrics in 12.621124ms
2017-05-23T10:56:00Z D! Output [graphite] buffer fullness: 1200 / 20000 metrics. 
2017-05-23T10:56:00Z D! Output [graphite] wrote batch of 1200 metrics in 5.05349ms
2017-05-23T10:57:00Z D! Output [graphite] wrote batch of 2000 metrics in 18.312017ms
2017-05-23T10:57:00Z D! Output [graphite] buffer fullness: 1800 / 20000 metrics. 
2017-05-23T10:57:00Z D! Output [graphite] wrote batch of 1800 metrics in 46.187326ms
2017-05-23T10:58:00Z D! Output [graphite] buffer fullness: 1374 / 20000 metrics. 
2017-05-23T10:58:00Z D! Output [graphite] wrote batch of 1374 metrics in 7.627821ms
2017-05-23T10:59:00Z D! Output [graphite] wrote batch of 2000 metrics in 10.180491ms
2017-05-23T10:59:00Z D! Output [graphite] wrote batch of 2000 metrics in 16.012394ms
2017-05-23T10:59:00Z D! Output [graphite] buffer fullness: 1026 / 20000 metrics. 
2017-05-23T10:59:00Z D! Output [graphite] wrote batch of 1026 metrics in 11.660007ms
2017-05-23T11:00:00Z D! Output [graphite] buffer fullness: 0 / 20000 metrics. 

@keyboardfann
Copy link
Author

I start two telegraf and two telegraf attach to all partitions and receive all data. So the problem seems to happen when multi-partition and only one telegraf .
image

@danielnelson
Copy link
Contributor

I'm not very familiar with this plugin yet but you should also be able to use a single telegraf with two [[inputs.kafka_consumer]] inputs.

@keyboardfann
Copy link
Author

Dear @danielnelson ,
Thank you for the reply. But setting two [[inputs.kafka_consumer]] inputs are very strange. As I know, the consumer should detect which partition not attach and should attach it automatically.

@danielnelson
Copy link
Contributor

Yes I agree, this is just a possible workaround so you don't need two process now. I agree this should be fixed. We are planning to merge #2487 before the 1.4 release, perhaps this will fix this issue?

@danielnelson danielnelson added area/kafka bug unexpected problem or unintended behavior labels May 26, 2017
@keyboardfann
Copy link
Author

Dear @danielnelson ,
Thank you for the help, It's a good news. Hope 1.4 can fix it and I can help to test again.

@russorat
Copy link
Contributor

@keyboardfann can you confirm that #2487 fixed this issue? If so, feel free to close.

@keyboardfann
Copy link
Author

Hi @russorat,
Let me confirm the issue has been resolved and update later.

@keyboardfann
Copy link
Author

Testing with 10 partations and 1 telegraf, seems the bug fixed. I close the ticket, if I meet the problem again, I will re-open it.
image

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kafka bug unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants