-
Notifications
You must be signed in to change notification settings - Fork 802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compactor Failing to compact blocks as it contains out of orders samples. #4573
Comments
This is now hopefully fixed by prometheus/prometheus#9856, but it needs to be updated in Cortex too. |
Thanks @pstibrany ... |
In that case I would definitely suggest to report it to Prometheus and include your repro test case. /cc @codesome |
Potential fix for this should be something like this: alanprot/prometheus@fbc4206 |
This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions. |
not stale discussion in prometheus still happening |
We can probably this close issue now as prometheus/prometheus#10624 was also closed? @alanprot |
Describe the bug
Compactor Failing to compact blocks as it contains out of orders samples.
I'm not sure if cortex is the right place to open this issue as this seems to be an issue on TSDB.
This is not a compactor problem but rather ingesters accepting duplicating samples or samples out of the order.
Looking at the problematic block i can see 3 problematic chunks:
Serie1:
Serie2:
Serie3:
Looking at those blocks, we see that the problems seems all happening around GMT 01:32:40.806, which interestingly it was when ingester was compacting and creating a new block:
And also we can see that lots of series were being created/removed from the ingester memory at the same time as well (probably because gc after the compact head)
Looking at TSDB code, seems that we have possible race condition when this happens:
Ex Timeline:
To Reproduce
Run Cortex and send very sparse data with duplicated samples. It will eventually happen.
I also have a unit test that triggers the issue: alanprot/prometheus@e30d871
Expected behavior
The sample (t1, v2) should not be accepted.
Environment:
Storage Engine
Additional Context
The text was updated successfully, but these errors were encountered: