-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Receive drops all incoming data if clock issues #6167
Comments
I am not sure if we can easily solve it because it's a Prometheus TSDB behavior. Enabling out-of-order ingestion could help, but it's still an experimental feature that hasn't been battle tested. |
wouldn't it be fairly easy for Thanos to just drop samples that are incoming and more than XX (configurable?) seconds in the future according to the server time from thanos-receive? |
+1 on this issue, we encountered this today, the TSDB head is March 12, 2023 as I comment today, any ideas @fpetkovski ? |
So we enabled this experimental flag |
@defreng What you suggested makes sense to me. Having some sort of an upper bound on the timestamp should at least prevent big losses of data. |
Hi Guys, put up a PR to address this issue, since it is my first time, welcome some early feedback (will add change log if it is good to go). |
There is now a flag to avoid ingesting samples that are too far into the future hence closing this. |
Thanos, Prometheus and Golang version used:
Thanos version:
Prometheus using remote_write:
Object Storage Provider: s3
What happened:
As it was discussed in the following closed issue, thanos receive stops processing incoming data from sources whenever it receives a data series which is ahead in time, it stops dropping all metrics from any prometheus instance using
remote_write
.The behavior is exactly what @mtlang was stating: #3765 (comment)
From my perspective looks like if by any chance any source forwarding ingesting data to Thanos receive sends a metric with the wrong time (future), Thanos stops considering current time as the "proper" one and starts to drop all metrics stating:
On prometheus, we are receiving 409 Conflicts.
What you expected to happen:
The data series which is ahead in time should be ignored by Thanos receive and so, allowing the rest of prometheus instance, which has correct NTP keep sending metrics.
How to reproduce it (as minimally and precisely as possible):
2 prometheus instance (in my case different k8s clusters) configured to send metrics using
remote_write
to a Thanos receive in another cluster. Then, change date in one of them as:Full logs to relevant components:
Thanos receive:
Prometheus clients:
The text was updated successfully, but these errors were encountered: