Large external label set in store gateway resulted in empty announced label set and data rage in Querier #6375
Replies: 1 comment 1 reply
-
The issue turned out to be related to relabel config and not really about the size of the label set. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi All,
Recently we encountered an issue were store gateway instances were discovered by Thanos Querier but the label set was empty and time range announced was Min & Max Integer.
This happened after we introduced a new value of an external label, this doubled the size of label set. Queries for blocks under the scope of the gateway instances came back with empty result. During this time queriers produced enormous log lines (suspecting it had all labelsets from every discovered store) so we suspected the issue might be related to the total size of labelset. We updated the relabelconfig for gateway to limit the scope to a smaller set of external labels and the gateways were immediately discovered with expected external labels and time range.
I have looked at the Info grpc_method performance emitted by grpc_client (from querier) and grpc_server (from gateway) and they are look normal relative to post fix. Logs do not indicate any issue and I do not see unusual context deadline exceeded error in logs either. I am having trouble figuring out when this started happening as I could not find any metric or log showing unusual errors.
Has anyone else seen similar issue? Lack of smoking gun in metric is scary as we wont be able to detect this issue when it happens the next time.
Beta Was this translation helpful? Give feedback.
All reactions