Prometheus remote write serialiser drops buffered metrics #11682

jhychan · 2022-08-15T21:47:01Z

Relevant telegraf.conf

[agent]
  interval = "1s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "5s"
  flush_jitter = "0s"
  precision = "0ns"
  debug = true
  logtarget = "stderr"
  hostname = ""
  omit_hostname = false
[[inputs.mock]]
  metric_name = "prometheus"
  interval = '1s'
  [[inputs.mock.random]]
     name = "rand1"
     min = 1.0
     max = 6.0
  [[inputs.mock.random]]
     name = "rand2"
     min = 1.0
     max = 6.0
[[processors.printer]]
[[outputs.http]]
  url = 'http://localhost:8000'
  use_batch_format = true
  data_format = 'prometheusremotewrite'
  [outputs.http.headers]
     Content-Type = "application/x-protobuf"
     Content-Encoding = "snappy"
     X-Prometheus-Remote-Write-Version = "0.1.0"

Logs from Telegraf

2022-08-15T21:34:49Z I! Starting Telegraf 1.22.1
2022-08-15T21:34:49Z I! Loaded inputs: mock
2022-08-15T21:34:49Z I! Loaded aggregators:
2022-08-15T21:34:49Z I! Loaded processors: printer
2022-08-15T21:34:49Z I! Loaded outputs: http
2022-08-15T21:34:49Z I! Tags enabled: host=nuc-8i5
2022-08-15T21:34:49Z I! [agent] Config: Interval:1s, Quiet:false, Hostname:"nuc-8i5", Flush Interval:5s
2022-08-15T21:34:49Z D! [agent] Initializing plugins
2022-08-15T21:34:49Z D! [agent] Connecting outputs
2022-08-15T21:34:49Z D! [agent] Attempting connection to [outputs.http]
2022-08-15T21:34:49Z D! [agent] Successfully connected to outputs.http
2022-08-15T21:34:49Z D! [agent] Starting service inputs
prometheus,host=nuc-8i5 rand1=2.2900650729308296,rand2=5.942111863190897 1660599290000000000
prometheus,host=nuc-8i5 rand1=1.331861192676328,rand2=3.679218574852587 1660599291000000000
prometheus,host=nuc-8i5 rand1=1.776215229191589,rand2=5.532875898377442 1660599292000000000
prometheus,host=nuc-8i5 rand1=2.5613192455950395,rand2=1.8862153815955114 1660599293000000000
prometheus,host=nuc-8i5 rand1=2.338794095906687,rand2=3.4718936185713916 1660599294000000000
2022-08-15T21:34:54Z D! [outputs.http] Wrote batch of 5 metrics in 329.8488ms
2022-08-15T21:34:54Z D! [outputs.http] Buffer fullness: 0 / 10000 metrics
prometheus,host=nuc-8i5 rand1=5.30867216416673,rand2=3.1653124940306547 1660599295000000000
prometheus,host=nuc-8i5 rand1=4.865885114611352,rand2=4.586182025366878 1660599296000000000
2022-08-15T21:34:56Z D! [agent] Stopping service inputs
2022-08-15T21:34:56Z D! [agent] Input channel closed
2022-08-15T21:34:56Z D! [agent] Processor channel closed
2022-08-15T21:34:56Z I! [agent] Hang on, flushing any cached metrics before shutdown
2022-08-15T21:34:57Z D! [outputs.http] Wrote batch of 2 metrics in 317.306ms
2022-08-15T21:34:57Z D! [outputs.http] Buffer fullness: 0 / 10000 metrics
2022-08-15T21:34:57Z I! [agent] Stopping running outputs
2022-08-15T21:34:57Z D! [agent] Stopped Successfully

I wrote a sample python http server that accepts the remote-write protobuf payloads, and prints them to stdout (ignore the 1 hour difference in timestamps, python was logging local time):

Server started http://localhost:8000
127.0.0.1 - - [15/Aug/2022 22:34:54] "POST / HTTP/1.1" 200 -
timeseries {
  labels {
    name: "host"
    value: "nuc-8i5"
  }
  labels {
    name: "__name__"
    value: "rand2"
  }
  samples {
    value: 3.4718936185713916
    timestamp: 1660599294000
  }
}
timeseries {
  labels {
    name: "host"
    value: "nuc-8i5"
  }
  labels {
    name: "__name__"
    value: "rand1"
  }
  samples {
    value: 2.338794095906687
    timestamp: 1660599294000
  }
}

127.0.0.1 - - [15/Aug/2022 22:34:57] "POST / HTTP/1.1" 200 -
timeseries {
  labels {
    name: "host"
    value: "nuc-8i5"
  }
  labels {
    name: "__name__"
    value: "rand1"
  }
  samples {
    value: 4.865885114611352
    timestamp: 1660599296000
  }
}
timeseries {
  labels {
    name: "host"
    value: "nuc-8i5"
  }
  labels {
    name: "__name__"
    value: "rand2"
  }
  samples {
    value: 4.586182025366878
    timestamp: 1660599296000
  }
}

System info

All platforms, all versions of telegraf with prometheusremotewrite serialiser

Docker

No response

Steps to reproduce

Configure telegraf to serialise metrics in batch format to prometheus remote write protocol buffer, sending to promethues remote-write endpoint
Configure metric collection interval to be more frequent than metric flush interval. This simulates buffer build up with more than 1 sample for each time series in a batch
Observe prometheusremotewrite output sending batch of metrics - it only sends the latest sample for each time series. All older time series samples are silently dropped.

Note that this may be difficult to observe under a healthy production scenario. However this is very apparent when you experience a remote-write endpoint outage that lasts long enough for Cortex/Mimir to reveal large gaps in your time series. When the remote-write endpoint is eventually restored, telegraf appears to write all metrics in the buffer out to the endpoint. But when you look at the time series data, you will observe that large gaps in the time series data.

Expected behavior

The prometheus remote write serialiser should collate all buffered metric samples for sending to the remote write endpoint. This is in contrast to the prometheus client output plugin, which must only present the latest metric samples for scraping.

The primary scenario where this matters is when the remote write endpoint is unreachable. Telegraf will buffer the metrics so that when the remote write endpoint is reachable again, all previous metrics in the buffer are sent to the remote write endpoint.

Actual behavior

Telegraf only sends one sample of each time series in each batch of metrics. The prometheus remote write serialiser incorrectly assumes that in any given batch of metrics , only the latest sample should be sent. All older samples for a given metric are dropped, causing the majority of metric samples buffered during a remote write endpoint outage to be lost. The actual number of samples lost depends on the maximum number of metrics per batch, and the number of unique time series per batch.

Additional info

The core logic causing the issue is located here:
https://github.com/influxdata/telegraf/blob/master/plugins/serializers/prometheusremotewrite/prometheusremotewrite.go#L198

The serialiser builds a map of time series metrics where each metric key only contains a timeseries object only with a single sample. If a metric with a newer sample in the batch is found, it replaces the older sample. All older metrics are silently lost.

I am working on a PR to correct this behaviour, where a given metric time series contains all samples in a batch, ordered chronologically.

The text was updated successfully, but these errors were encountered:

powersj · 2022-08-18T20:09:52Z

next steps: review PR

syepes · 2022-12-12T16:25:48Z

Any updates on this?

powersj · 2022-12-12T16:47:40Z

Any updates on this?

As the next steps state, take a look at the discussion on the PR in #11683, specifically this comment.

davidkohr · 2023-10-06T15:17:20Z

I followed the other related threads and it seems like some work was done, but we recently confirmed that this is still happening with v1.28.1 when we write to Prometheus directly from Telegraf with batching enabled. With batching disabled, no metrics are dropped, but I'm not sure if I'd stay happy enough with that workaround. I also tried the prometheus_sort_metrics setting, but it didn't seem to help.

moorglade · 2023-10-06T17:49:44Z

AFAIR another workaround is to use a small batch size - we did some tests and it seems that if the batch size is smaller than the number of collected metrics/series, the metrics will not be dropped.

Still far from ideal, but better than no batching at all.

davidkohr · 2023-10-24T19:39:01Z

This bug still should be fixed, but in the meantime, we're going to trial influx2cortex and just get our Telegrafs to write using the influxdb output.

llamafilm · 2024-10-24T07:17:04Z

I found another scenario where this matters. I have a network audio packet counter that normally increments by 1000 every second when the system is stable. Plotted on a line graph, that should look like a smooth diagonal line. But in reality, every third metric is lost and replaced by the previous value. Telegraf is set for 60s minute interval and 90s flush interval.

If I disable batch mode then this problem goes away.

First image is the raw data.

Second image is a rate aggregation query rate(qsys_aes67_count{source="$qsys_core_ip"}[$__rate_interval]). Normally this should show a horizontal line as long as the audio flow is steady. But it actually shows a sawtooth pattern.

jhychan added the bug unexpected problem or unintended behavior label Aug 15, 2022

jhychan changed the title ~~Prometheus remote write drops buffered metrics~~ Prometheus remote write serialiser drops buffered metrics Aug 15, 2022

jhychan mentioned this issue Aug 15, 2022

fix: sort and send all metric samples in a batch (prometheusremotewrite) #11683

Closed

3 tasks

powersj added help wanted Request for community participation, code, contribution size/l 1 week or more effort labels Aug 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prometheus remote write serialiser drops buffered metrics #11682

Prometheus remote write serialiser drops buffered metrics #11682

jhychan commented Aug 15, 2022 •

edited

Loading

powersj commented Aug 18, 2022

syepes commented Dec 12, 2022

powersj commented Dec 12, 2022

davidkohr commented Oct 6, 2023

moorglade commented Oct 6, 2023

davidkohr commented Oct 24, 2023

llamafilm commented Oct 24, 2024

Prometheus remote write serialiser drops buffered metrics #11682

Prometheus remote write serialiser drops buffered metrics #11682

Comments

jhychan commented Aug 15, 2022 • edited Loading

Relevant telegraf.conf

Logs from Telegraf

System info

Docker

Steps to reproduce

Expected behavior

Actual behavior

Additional info

powersj commented Aug 18, 2022

syepes commented Dec 12, 2022

powersj commented Dec 12, 2022

davidkohr commented Oct 6, 2023

moorglade commented Oct 6, 2023

davidkohr commented Oct 24, 2023

llamafilm commented Oct 24, 2024

jhychan commented Aug 15, 2022 •

edited

Loading