-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
query: v0.32.1+ remote-write prometheus + layered queriers: "vector cannot contain metrics with the same labelset" #6677
Comments
Are stores and receivers version 0.32.1 too? |
Yes, they match the version of the leaf thanos-query |
Maybe you could write out the labels of each series when you query |
I don't understand what you mean? The only external label we add that's different between replicas, for deduplication, is the
Which we deduplicate by specifying a replica label of |
I mean could you please execute an instant query |
same error message as in #6495 |
{branch="HEAD", container="prometheus", endpoint="http-web", goarch="amd64", goos="linux", goversion="go1.20.5", instance="10.15.224.46:9090", job="ops-system/monitoring-stack-kube-prom-prometheus", location="global", namespace="ops-system", pod="prom-agent-monitoring-stack-kube-prom-prometheus-0", prometheus="ops-system/monitoring-stack-kube-prom-prometheus", prometheus_replica="prom-agent-monitoring-stack-kube-prom-prometheus-0", receive="true", region="core", replica="thanos-receive-1", revision="8ef767e396bf8445f009f945b0162fd71827f445", tags="netgo,builtinassets,stringlabels", tenant_id="k3s", version="2.45.0"} here with out dedup. This explicit fails with activated dedup. |
if im not wrong: if we address replica labels we have the same instance "10.15.224.46" ( one prometheus replica one thanos receive replica" right? ) |
@farodin91 and what kind of parameters do you have on Thanos Query? |
|
|
Hey; are you able to provide some offending blocks? That would help me trying to reproduce and understand this! |
@farodin91 is there a reason why these series have
|
Isn't the replica removed in the storegateway? |
Only after compaction i think. New blocks will still have it |
in 330h their should be a compaction |
I updated all thanos components to main-2023-09-05-d1edf74. No improve, i see the same issue. |
@GiedriusS Do you have any idea to help you? How I can debug this issue? |
Now, it also fails for some queries, if use dedup is activated or not. |
I'm fairly sure it might be #6702 this |
@MichaHoffmann Do you want to release it as 0.32.3? |
This is running good for us. |
After upgrading to v0.32.0 and later v0.32.1+, our setup no longer functions, returning a random labelset error. We're using a layered thanos-query setup, with the top layer being the target for thanos-frontend.
Thanos, Prometheus and Golang version used: Since v0.31, currently main-2023-08-28-32412dc
Object Storage Provider:
gcs
What happened:
When executing an instant query on our setup, after upgrade from 0.30.2 to 0.32.1+ (main), we're seeing this error:
{"status":"error","errorType":"execution","error":"vector cannot contain metrics with the same labelset"}
What you expected to happen:
❯ curl http://localhost:10902/api/v1/query --data-urlencode 'query=count(last_over_time(prometheus_build_info[5m]))'
{"status":"success","data":{"resultType":"vector","result":[{"metric":{},"value":[1693271364.906,"580"]}]}}
A value higher than >500
How to reproduce it (as minimally and precisely as possible):
kubectl port-forward directly to a thanos-ring pod and execute and instant query using the HTTP API. This eliminates any issues with thanos-frontend by exclusion.
Our Setup:
Queries are made against
thanos-frontend
, which passes them directly tothanos-ring
.The ring component has the job of Query Fanout against all thanos-query targets in our fleet. It is configured like so:
thanos-query.org-monitoring.svc.clusterset.local:10901 is a GKE MultiClusterServices Target. It produces a list of thanos-query endpoints discovered across the fleet. This list is the number of running thanos-query replicas across the fleet.
Each discovered thanos-query is responsible for the endpoints in it's own region, it is configured like so:
This is a layered querier setup as described in: https://thanos.io/tip/components/query.md/#global-view
Full logs to relevant components:
Anything else we need to know:
Most of our remote senders have an external label set like:
replica: prometheus-n
Some newer senders have an external label with a more specific
prometheus_replica
label.We've tried a mix of playing with external labels and thanos query settings to no avail. Reverting just the ring component to v0.30.2 resolves the issue.
Also, sometimes the query works if the duration is kept really short:
The text was updated successfully, but these errors were encountered: