Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing metrics when metric_buffer_limit fullness = 100/300 metrics #4960

Closed
canimus opened this issue Nov 5, 2018 · 3 comments
Closed

Missing metrics when metric_buffer_limit fullness = 100/300 metrics #4960

canimus opened this issue Nov 5, 2018 · 3 comments
Assignees
Labels
bug unexpected problem or unintended behavior
Milestone

Comments

@canimus
Copy link

canimus commented Nov 5, 2018

Reproduction:

  • Spawn a 2 containers with telegraf and influxdb
  • Simple configuration with 2 global tags, 1 input, 2 global tags, 10 fields, 1 output
  • Collections scheduled every 10s
  • metric_batch_size = 100
  • metric_buffer_limit = 300
  • docker-compose.yml included

Expected Behavior:

  • After shutting down influxdb manually to confirm buffer loading up in telegraf, the unexpected appeared close to the metric_batch_size
  • However during that period metrics are lost and not stored in the buffer

Configuration:

  • telegraf:1.8.3-alpine
  • influxdb:1.6.4-alpine

telegraf.conf

[global_tags]
  dc = "wcr"
  env = "d1"
[agent]  
  interval = "10s"
  round_interval = true
  metric_batch_size = 100
  metric_buffer_limit = 300
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = ""
  debug = true
  quiet = false
  logfile = ""
  hostname = ""
  omit_hostname = false


###############################################################################
#                            OUTPUT PLUGINS                                   #
###############################################################################

[[outputs.influxdb]]
  urls = ["http://influxdb:8086"]
  database = "telegraf"


###############################################################################
#                            INPUT PLUGINS                                    #
###############################################################################

[[inputs.cpu]]
  percpu = false
  totalcpu = true
  collect_cpu_time = false
  report_active = false
  tagdrop = ["usage_guest","usage_guest_nice", "usage_steal", "usage_nice"]

telegraf.log

telegraf_1    | 2018-11-05T18:21:30Z E! Error writing to output [influxdb]: could not write any address
telegraf_1    | 2018-11-05T18:21:40Z D! Output [influxdb] buffer fullness: 99 / 300 metrics. 
telegraf_1    | 2018-11-05T18:21:40Z E! [outputs.influxdb] when writing to [http://influxdb:8086]: Post http://influxdb:8086/write?db=telegraf: dial tcp: lookup influxdb on 127.0.0.11:53: server misbehaving
telegraf_1    | 2018-11-05T18:21:40Z E! Error writing to output [influxdb]: could not write any address
telegraf_1    | 2018-11-05T18:21:50Z D! Output [influxdb] buffer fullness: 100 / 300 metrics. 
telegraf_1    | 2018-11-05T18:21:50Z E! [outputs.influxdb] when writing to [http://influxdb:8086]: Post http://influxdb:8086/write?db=telegraf: dial tcp: lookup influxdb on 127.0.0.11:53: server misbehaving
telegraf_1    | 2018-11-05T18:21:50Z E! Error writing to output [influxdb]: could not write any address
telegraf_1    | 2018-11-05T18:22:00Z D! Output [influxdb] buffer fullness: 101 / 300 metrics. 
telegraf_1    | 2018-11-05T18:22:00Z E! [outputs.influxdb] when writing to [http://influxdb:8086]: Post http://influxdb:8086/write?db=telegraf: dial tcp: lookup influxdb on 127.0.0.11:53: server misbehaving
telegraf_1    | 2018-11-05T18:22:00Z E! Error writing to output [influxdb]: could not write any address
telegraf_1    | 2018-11-05T18:22:10Z D! Output [influxdb] buffer fullness: 101 / 300 metrics. 
telegraf_1    | 2018-11-05T18:22:10Z E! [outputs.influxdb] when writing to [http://influxdb:8086]: Post http://influxdb:8086/write?db=telegraf: dial tcp: lookup influxdb on 127.0.0.11:53: server misbehaving
telegraf_1    | 2018-11-05T18:22:10Z E! Error writing to output [influxdb]: could not write any address
telegraf_1    | 2018-11-05T18:22:20Z D! Output [influxdb] buffer fullness: 102 / 300 metrics. 
telegraf_1    | 2018-11-05T18:22:20Z E! [outputs.influxdb] when writing to [http://influxdb:8086]: Post http://influxdb:8086/write?db=telegraf: dial tcp: lookup influxdb on 127.0.0.11:53: server misbehaving
telegraf_1    | 2018-11-05T18:22:20Z E! Error writing to output [influxdb]: could not write any address
telegraf_1    | 2018-11-05T18:22:30Z D! Output [influxdb] buffer fullness: 103 / 300 metrics. 
telegraf_1    | 2018-11-05T18:22:30Z E! [outputs.influxdb] when writing to [http://influxdb:8086]: Post http://influxdb:8086/write?db=telegraf: dial tcp: lookup influxdb on 127.0.0.11:53: server misbehaving
telegraf_1    | 2018-11-05T18:22:30Z E! Error writing to output [influxdb]: could not write any address
telegraf_1    | 2018-11-05T18:22:40Z D! Output [influxdb] buffer fullness: 103 / 300 metrics. 
telegraf_1    | 2018-11-05T18:22:40Z E! [outputs.influxdb] when writing to [http://influxdb:8086]: Post http://influxdb:8086/write?db=telegraf: dial tcp: lookup influxdb on 127.0.0.11:53: server misbehaving

Attention:
18:21 hrs

influx -db telegraf

> select count(usage_idle) from cpu group by time(30s)
name: cpu
time                 count
----                 -----
2018-11-05T18:04:00Z 2
2018-11-05T18:04:30Z 3
2018-11-05T18:05:00Z 3
2018-11-05T18:05:30Z 0
2018-11-05T18:06:00Z 3
2018-11-05T18:06:30Z 3
2018-11-05T18:07:00Z 3
2018-11-05T18:07:30Z 3
2018-11-05T18:08:00Z 3
2018-11-05T18:08:30Z 3
2018-11-05T18:09:00Z 3
2018-11-05T18:09:30Z 3
2018-11-05T18:10:00Z 3
2018-11-05T18:10:30Z 3
2018-11-05T18:11:00Z 3
2018-11-05T18:11:30Z 3
2018-11-05T18:12:00Z 3
2018-11-05T18:12:30Z 3
2018-11-05T18:13:00Z 3
2018-11-05T18:13:30Z 3
2018-11-05T18:14:00Z 3
2018-11-05T18:14:30Z 3
2018-11-05T18:15:00Z 3
2018-11-05T18:15:30Z 3
2018-11-05T18:16:00Z 3
2018-11-05T18:16:30Z 3
2018-11-05T18:17:00Z 3
2018-11-05T18:17:30Z 3
2018-11-05T18:18:00Z 3
2018-11-05T18:18:30Z 3
2018-11-05T18:19:00Z 3
2018-11-05T18:19:30Z 3
2018-11-05T18:20:00Z 3
2018-11-05T18:20:30Z 3
2018-11-05T18:21:00Z 3
2018-11-05T18:21:30Z 2
2018-11-05T18:22:00Z 1
2018-11-05T18:22:30Z 2
2018-11-05T18:23:00Z 2
2018-11-05T18:23:30Z 3
2018-11-05T18:24:00Z 3
2018-11-05T18:24:30Z 3
2018-11-05T18:25:00Z 3
2018-11-05T18:25:30Z 3
2018-11-05T18:26:00Z 3
2018-11-05T18:26:30Z 3
2018-11-05T18:27:00Z 3
2018-11-05T18:27:30Z 3
2018-11-05T18:28:00Z 3
2018-11-05T18:28:30Z 3
2018-11-05T18:29:00Z 3
2018-11-05T18:29:30Z 3
2018-11-05T18:30:00Z 3
2018-11-05T18:30:30Z 3
2018-11-05T18:31:00Z 3
2018-11-05T18:31:30Z 3
2018-11-05T18:32:00Z 3
2018-11-05T18:32:30Z 3
2018-11-05T18:33:00Z 3
2018-11-05T18:33:30Z 3
2018-11-05T18:34:00Z 3
2018-11-05T18:34:30Z 3
2018-11-05T18:35:00Z 3
2018-11-05T18:35:30Z 3
@danielnelson danielnelson self-assigned this Nov 5, 2018
@danielnelson danielnelson added this to the 1.9.0 milestone Nov 5, 2018
@danielnelson danielnelson added the bug unexpected problem or unintended behavior label Nov 5, 2018
@danielnelson
Copy link
Contributor

I'm not sure what the exact cause of this is in <1.8, but it appears to be fixed in 1.9 by #4938.

@canimus
Copy link
Author

canimus commented Nov 6, 2018

@danielnelson thanks for the promptly reply. Just to confirm you highlighted <1.8 but it is actually 1.8.3
Is there any target for the promotion from 1.9-rc1 to 1.9.0? Thanks.

@danielnelson
Copy link
Contributor

Yes, to be more precise this seems to affect version earlier than 1.9.0-rc1. Usually we aim for about a 2 week release candidate process but it does depend on what issues are found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

2 participants