Mimir failed consistency check, unable to query certain blocks #2656
-
Hello, all.
We aren't able to query between 1659628830000 (2022-08-04T16:00:30 UTC) and 1659636300000 (2022-08-04T18:05:00 UTC). It appears to be just these 6 blocks that are causing an issue. No updates have been done on our Mimir cluster recently, and no changes have been made to the configuration. I'm at a bit of a loss as to why these 6 objects would be deleted, and why Mimir would not be willing to query them once restored to object storage and the index. Any guidance would be greatly appreciated. I've also included a copy of our config, in case it's helpful. We have Mimir deployed as 4 individual single-node services on Docker Swarm Mode. |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 14 replies
-
I think there are two different things to investigate:
Do you have the compactor logs around the time these blocks where deleted? Can you find any related log message? I would like to better understand if they were deleted by the compactor (no other Mimir component can delete blocks, so if it wasn't the compactor then it has been caused by something outside Mimir control).
Queriers look up blocks through the bucket index. The bucket index is kept updated periodically by the compactor (by default every You can manually lookup the bucket index in the object storage: it's stored at the path |
Beta Was this translation helpful? Give feedback.
-
I'm exactly in the same situation as @mari-arondeus This happened for a full weekend, so only on monday I realized that I couldn't store data with Mimir into Minio because of this error.
So far so good, mimir could again store data into my minio S3 buket. My problem is that any Grafana query that includes the "weekend of death" will fail. Sorry if I'm kind of misunderstanding this here. |
Beta Was this translation helpful? Give feedback.
-
Also, how can I increase the 'local' mimir temporary storage to let's say a week in order to prevent such issues ? Is setting the following sufficient in mimir's config file ?
I've grepped a single missing block over the mimir logs, this is the first error message I got for this block (which obviously couldn't be written to minio because of the versionning limit)
Last but not least, sorry to piggyback this discussion ;) Hopefully my issue is identical enough for this to make sense. |
Beta Was this translation helpful? Give feedback.
-
Okay... Here's a big WTF moment for me... Unless I became Mr Hyde, I'm pretty sure that I didn't do any file operations in the minio bucket, so I have no explanation as of why there would be any files not owned by my minio service user. Anyway, glad I got everything to work. As side questions: Any insight would be appreciated, and again, sorry if I ask a lot of questions. |
Beta Was this translation helpful? Give feedback.
-
Here's another resolution I've found for this problem.
or
I've tried to downgrade from 2.14 to 2.13 without success. What I did think is that somewhere mimir 2.14 did not properly write / delete data in the S3 bucket. While checking my data, I noticed that although I have versionning disabled on minio, I had alot of versions of files marked for deletion.
Then, navigating into the real path where minio stores the block files (as minio-user):
After these commands I restarted mimir, waited 30 min for next compactor cleanup to run, and voilà. Hope this helps anyone ;) |
Beta Was this translation helpful? Give feedback.
I think there are two different things to investigate:
Do you have the compactor logs around the time these blocks where deleted? Can you find any related log message? I would like to better understand if they were deleted by the compactor (no other Mimir component can delete blocks, so if it wasn't the compactor then it has been caused by something outside Mimir control).
Queriers look up blocks through the bucket index. The bucket index is kept upda…