-
Notifications
You must be signed in to change notification settings - Fork 802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compactor can fail with "block with not healthy index found ... series have an average of 1.000 out-of-order chunks: 0.000 of these are exact duplicates (in terms of data and time range)" message #3569
Comments
We recently hit this issue as well. |
Could you paste the exact log error you've got? |
|
I am wondering, can this issue be caused by prometheus/prometheus#8055 ? Because #8055 seems to introduce out of order samples. |
It shouldn't. What you got is chunks of order within the same block while the issue you linked is about 2 different blocks overlapping in time. |
@pracucci and @pstibrany for this issue, do you think it would be an improvement if we change the compactor not to halt the whole compacting process when a level1 block is bad; given that there is replicas of that block. |
Hi, We also encountered this issue with the following message on our compactor :
If it is any help, you can find the content of the not healthy block here : https://dl.plik.ovh/file/snWjOJ69TkrmdcUN/D8sbecutaDZBVAsN/01FM53C7H2SR3N111QZMA3TK8P.tar.gz Regards, Julien. |
A better workaround for this issue is to mark the block as no-compact.
|
Compactor can fail to compact block with message like this:
When this happens, compaction for given user will not continue, because compactor will retry to compact this block over and over, failing each time.
Upon further investigation, this is a 2h block produced by ingester. It's not clear why out-of-order chunks would be written. This is bug likely in Prometheus TSDB code.
Similar bugs in Thanos:
Workaround is to rename the block so that it's not included in the compaction.
The text was updated successfully, but these errors were encountered: