Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

receive: internal server error with when it receives metadata of metrics in remote write requests (Prometheus >=2.24) #3772

Closed
mxmorin opened this issue Feb 5, 2021 · 11 comments

Comments

@mxmorin
Copy link

mxmorin commented Feb 5, 2021

Thanos, Prometheus and Golang version used:

thanos, version 0.16.0 (branch: HEAD, revision: dc6a1666bb68cd5c11be54452842e823f57668c5)
  build user:       root@ba806318d94d
  build date:       20201026-13:56:53
  go version:       go1.15

Object Storage Provider:
S3

What happened:
After upgrading Prometheus from version 2.20 to 2.24, Internal server errors suddenly appeared in the logs with no other informations (even in debug mode).
I downgraded the Prometheus version to 2.20 and everything went back to normal.

Is thanos compatible with Prometheus 2.24 ?

Full logs to relevant components:

Logs

Jan 28 06:13:22 p1thanos01 thanos[50433]: level=error ts=2021-01-28T05:13:22.69662862Z caller=handler.go:331 component=receive component=receive-handler err= msg="internal server error"
Jan 28 06:13:22 p1thanos01 thanos[50433]: level=error ts=2021-01-28T05:13:22.797967408Z caller=handler.go:331 component=receive component=receive-handler err= msg="internal server error"
Jan 28 06:13:22 p1thanos01 thanos[50433]: level=error ts=2021-01-28T05:13:22.899356197Z caller=handler.go:331 component=receive component=receive-handler err= msg="internal server error"
Jan 28 06:13:23 p1thanos01 thanos[50433]: level=error ts=2021-01-28T05:13:23.000757915Z caller=handler.go:331 component=receive component=receive-handler err= msg="internal server error"
Jan 28 06:13:23 p1thanos01 thanos[50433]: level=error ts=2021-01-28T05:13:23.102497326Z caller=handler.go:331 component=receive component=receive-handler err= msg="internal server error"
Jan 28 06:13:23 p1thanos01 thanos[50433]: level=error ts=2021-01-28T05:13:23.20422861Z caller=handler.go:331 component=receive component=receive-handler err= msg="internal server error"
Jan 28 06:13:23 p1thanos01 thanos[50433]: level=error ts=2021-01-28T05:13:23.305524863Z caller=handler.go:331 component=receive component=receive-handler err= msg="internal server error"
Jan 28 06:13:23 p1thanos01 thanos[50433]: level=error ts=2021-01-28T05:13:23.406696696Z caller=handler.go:331 component=receive component=receive-handler err= msg="internal server error"

Environment:

  • OS (e.g. from /etc/os-release): Centos 8
@kakkoyun
Copy link
Member

@mxmorin Thanks for reporting. This shouldn't have happened, there aren't any changes in the API or data model in Prometheus between these releases. That being said, we need to take a closer look.

Are you sure this happened because of the version upgrade? Also could you please provide more information about your stack?

@kakkoyun kakkoyun changed the title [RECEIVE] internal server error with Prometheus 2.24 receive: Internal server error with Prometheus 2.24 Feb 10, 2021
@mxmorin
Copy link
Author

mxmorin commented Feb 10, 2021

Thank you for your response
We have 2 Prometheus 2.20 (replicas) in Openshift and one Prometheus 2.20 in a VM.
All these instances are configured with remote_write to 2 load balanced thanos receiver instances with S3 storage.

This has been working for some time

I updated Prometheus to 2.24 and many internal server errors appeared in the thanos logs but it seems that the metrics are well uploaded. I only see small holes in the grafana graphs
I downgraded Prometheus to 2.20 and all the errors have disappeared as well as the holes in the graphs.

@GiedriusS
Copy link
Member

#3815 does this help with your problem?

@roidelapluie
Copy link

Can that be because of the metric metadata being sent (Prometheus 2.23)?

Can you try Prometheus 2.24 with:

metadata_config:
  send: false

in your remote write config?

@universam1
Copy link

Can Thanos be patched to ignore those fields for now?
There is also not an option to disable that via the Operator prometheus-operator/prometheus-operator#3889

@roidelapluie
Copy link

Could you confirm that this is the cause of the issue?

@universam1
Copy link

Could you confirm that this is the cause of the issue?

Unfortunately I can only prove that downgrading to 0.22.2 solves the issue

@mxmorin
Copy link
Author

mxmorin commented Mar 8, 2021

I've tested in my staging env with 2.25 version of prometheus and 1.16 for Thanos and got Internal errors again
I applied @roidelapluie's suggestion (add metdata_config in prometheus.yml) and errors disappeared.

So i can confirm this due to metadatas.

@GiedriusS GiedriusS changed the title receive: Internal server error with Prometheus 2.24 receive: internal server error with when it receives metadata of metrics in remote write requests (Prometheus >=2.24) Mar 8, 2021
@bwplotka
Copy link
Member

BTW this is something we seen before and we fixed this on v0.19.0-rc.1 and further. Can you try such version? 🤗

Fix: #3836

@bwplotka
Copy link
Member

Let us know if you can still this issue on the newest version, we see it's being fixed on our clusters.

@yeya24
Copy link
Contributor

yeya24 commented Mar 10, 2021

I've tested in my staging env with 2.25 version of prometheus and 1.16 for Thanos and got Internal errors again
I applied @roidelapluie's suggestion (add metdata_config in prometheus.yml) and errors disappeared.

So i can confirm this due to metadatas.

I double-checked and this is resolved in the latest release with this #3815

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants