receive: internal server error with when it receives metadata of metrics in remote write requests (Prometheus >=2.24) #3772

mxmorin · 2021-02-05T08:05:12Z

Thanos, Prometheus and Golang version used:

thanos, version 0.16.0 (branch: HEAD, revision: dc6a1666bb68cd5c11be54452842e823f57668c5)
  build user:       root@ba806318d94d
  build date:       20201026-13:56:53
  go version:       go1.15

Object Storage Provider:
S3

What happened:
After upgrading Prometheus from version 2.20 to 2.24, Internal server errors suddenly appeared in the logs with no other informations (even in debug mode).
I downgraded the Prometheus version to 2.20 and everything went back to normal.

Is thanos compatible with Prometheus 2.24 ?

Full logs to relevant components:

Logs

Jan 28 06:13:22 p1thanos01 thanos[50433]: level=error ts=2021-01-28T05:13:22.69662862Z caller=handler.go:331 component=receive component=receive-handler err= msg="internal server error"
Jan 28 06:13:22 p1thanos01 thanos[50433]: level=error ts=2021-01-28T05:13:22.797967408Z caller=handler.go:331 component=receive component=receive-handler err= msg="internal server error"
Jan 28 06:13:22 p1thanos01 thanos[50433]: level=error ts=2021-01-28T05:13:22.899356197Z caller=handler.go:331 component=receive component=receive-handler err= msg="internal server error"
Jan 28 06:13:23 p1thanos01 thanos[50433]: level=error ts=2021-01-28T05:13:23.000757915Z caller=handler.go:331 component=receive component=receive-handler err= msg="internal server error"
Jan 28 06:13:23 p1thanos01 thanos[50433]: level=error ts=2021-01-28T05:13:23.102497326Z caller=handler.go:331 component=receive component=receive-handler err= msg="internal server error"
Jan 28 06:13:23 p1thanos01 thanos[50433]: level=error ts=2021-01-28T05:13:23.20422861Z caller=handler.go:331 component=receive component=receive-handler err= msg="internal server error"
Jan 28 06:13:23 p1thanos01 thanos[50433]: level=error ts=2021-01-28T05:13:23.305524863Z caller=handler.go:331 component=receive component=receive-handler err= msg="internal server error"
Jan 28 06:13:23 p1thanos01 thanos[50433]: level=error ts=2021-01-28T05:13:23.406696696Z caller=handler.go:331 component=receive component=receive-handler err= msg="internal server error"

Environment:

OS (e.g. from /etc/os-release): Centos 8

The text was updated successfully, but these errors were encountered:

kakkoyun · 2021-02-10T08:55:45Z

@mxmorin Thanks for reporting. This shouldn't have happened, there aren't any changes in the API or data model in Prometheus between these releases. That being said, we need to take a closer look.

Are you sure this happened because of the version upgrade? Also could you please provide more information about your stack?

mxmorin · 2021-02-10T09:09:57Z

Thank you for your response
We have 2 Prometheus 2.20 (replicas) in Openshift and one Prometheus 2.20 in a VM.
All these instances are configured with remote_write to 2 load balanced thanos receiver instances with S3 storage.

This has been working for some time

I updated Prometheus to 2.24 and many internal server errors appeared in the thanos logs but it seems that the metrics are well uploaded. I only see small holes in the grafana graphs
I downgraded Prometheus to 2.20 and all the errors have disappeared as well as the holes in the graphs.

GiedriusS · 2021-02-24T22:14:30Z

#3815 does this help with your problem?

roidelapluie · 2021-02-24T23:21:26Z

Can that be because of the metric metadata being sent (Prometheus 2.23)?

Can you try Prometheus 2.24 with:

metadata_config:
  send: false

in your remote write config?

universam1 · 2021-03-03T11:41:05Z

Can Thanos be patched to ignore those fields for now?
There is also not an option to disable that via the Operator prometheus-operator/prometheus-operator#3889

roidelapluie · 2021-03-03T11:45:35Z

Could you confirm that this is the cause of the issue?

universam1 · 2021-03-03T15:40:06Z

Could you confirm that this is the cause of the issue?

Unfortunately I can only prove that downgrading to 0.22.2 solves the issue

mxmorin · 2021-03-08T12:59:23Z

I've tested in my staging env with 2.25 version of prometheus and 1.16 for Thanos and got Internal errors again
I applied @roidelapluie's suggestion (add metdata_config in prometheus.yml) and errors disappeared.

So i can confirm this due to metadatas.

bwplotka · 2021-03-10T19:07:34Z

BTW this is something we seen before and we fixed this on v0.19.0-rc.1 and further. Can you try such version? 🤗

Fix: #3836

bwplotka · 2021-03-10T19:16:34Z

Let us know if you can still this issue on the newest version, we see it's being fixed on our clusters.

yeya24 · 2021-03-10T19:27:40Z

I've tested in my staging env with 2.25 version of prometheus and 1.16 for Thanos and got Internal errors again
I applied @roidelapluie's suggestion (add metdata_config in prometheus.yml) and errors disappeared.

So i can confirm this due to metadatas.

I double-checked and this is resolved in the latest release with this #3815

kakkoyun added component: receive needs-more-info labels Feb 10, 2021

kakkoyun changed the title ~~[RECEIVE] internal server error with Prometheus 2.24~~ receive: Internal server error with Prometheus 2.24 Feb 10, 2021

m-lawre mentioned this issue Feb 19, 2021

Receive: Improve handling of empty time series from clients #3815

Merged

2 tasks

universam1 mentioned this issue Mar 3, 2021

Prometheus remote_write metadata cannot be disabled prometheus-operator/prometheus-operator#3889

Closed

GiedriusS removed the needs-more-info label Mar 8, 2021

GiedriusS changed the title ~~receive: Internal server error with Prometheus 2.24~~ receive: internal server error with when it receives metadata of metrics in remote write requests (Prometheus >=2.24) Mar 8, 2021

yeya24 mentioned this issue Mar 10, 2021

Update prompb to have metric metadata and exemplars #3905

Merged

2 tasks

bwplotka closed this as completed Mar 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

receive: internal server error with when it receives metadata of metrics in remote write requests (Prometheus >=2.24) #3772

receive: internal server error with when it receives metadata of metrics in remote write requests (Prometheus >=2.24) #3772

mxmorin commented Feb 5, 2021

kakkoyun commented Feb 10, 2021

mxmorin commented Feb 10, 2021

GiedriusS commented Feb 24, 2021

roidelapluie commented Feb 24, 2021

universam1 commented Mar 3, 2021

roidelapluie commented Mar 3, 2021

universam1 commented Mar 3, 2021

mxmorin commented Mar 8, 2021

bwplotka commented Mar 10, 2021

bwplotka commented Mar 10, 2021

yeya24 commented Mar 10, 2021 •

edited

Loading

receive: internal server error with when it receives metadata of metrics in remote write requests (Prometheus >=2.24) #3772

receive: internal server error with when it receives metadata of metrics in remote write requests (Prometheus >=2.24) #3772

Comments

mxmorin commented Feb 5, 2021

kakkoyun commented Feb 10, 2021

mxmorin commented Feb 10, 2021

GiedriusS commented Feb 24, 2021

roidelapluie commented Feb 24, 2021

universam1 commented Mar 3, 2021

roidelapluie commented Mar 3, 2021

universam1 commented Mar 3, 2021

mxmorin commented Mar 8, 2021

bwplotka commented Mar 10, 2021

bwplotka commented Mar 10, 2021

yeya24 commented Mar 10, 2021 • edited Loading

yeya24 commented Mar 10, 2021 •

edited

Loading