-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compact: Get rid of syncDelay
and handle eventual consistency & partial upload differently.
#377
Comments
A related scenario is when a upload is in progress, so the sidecar has only uploaded the chunks and possibly the index file of a block. Currently when the compactor comes across a partially uploaded block, the whole compaction process is aborted. One example scenario for this is that a server crashes during the upload of the index file. In this case no compaction or downsampling can be done until the server is back up and the upload is reattempted and completed, or the block directory is manually cleaned up. |
Yes indeed. What do you think the sane logic should do in case of partially uploaded block? I think we need to divide this problem (and conquer it): A) First of all we need to detect in what state block is. Is it partially uploaded because it is being uploaded right now? (a) Or is it partially uploaded because of any disaster happening? (b) B) The second question is what to do for after we detect what is going on: There is also C) issue here: |
Aside: I think this relates to #318 as it's about how to handle download errors. Without knowing anything about the TSDB format or how compaction works, I'm probably oversimplifying here but: I think the sane logic for the compactor/store is to treat it like the partial block is not there at all. On a filesystem I would have wanted to put all data into a temporary directory and move that directory into place in one atomic operation once everything is there. I don't think cloud storage support this but they do guarantee that partial file uploads doesn't happen(?). So to mimic the move, treat any missing file as indicating "the move" not happenening yet. A) I think this is only possible to know for the sidecar that was uploading it? And that sidecar may be down for maintenance and unable to provide any answer. I guess the cluster could look at the external labels in the chunks, and see if any members of the cluster have the same labels and try to detect it that way but it seems fragile. B) I would argue that the response to both situations is the same for the compactor/store: Wait until the next run of the compactor/pretend the partial block is not there. The sidecar/ruler should be responsible for fixing partial uploads. It should not have marked the block as uploaded in thanos.shipper.json since that only happens after all the files are successfully uploaded so this should happen automatically. To clean up after cases where the servers are gone for good, a rule could be added to the "bucket validate" command that it deletes partial buckets that are more than (insert too long timerange to be useful any more) old. C) Does handling of a partially uploaded block need to differ from how blocks that have not started uploading at all are handled? Consider the case where the thanos sidecar has been down for maintenance. If it is turned on again just before compaction is happening we get a partial upload, if it's turned on again 5 minutes later no data has yet been uploaded at all. |
Dup of #298 |
syncDelay
in compactor was used to mitigate two problems:syncDelay
in theory was fine because we just assumed that new blocks are delayed, but it causes lot's of overlaps in blocks because "new" block is based on ULID (creation time) which not necessarily meant new time range. Example failure scenario for this was:[t-100, t-50)
(let's say block 'A'). The new block 'A' is new so sync delay is keeping it invisible for compactor. Some fresh block 'B' (after sync delay) appears in compactor for time range[t-20, t-10)
. It can happen that compactor assumes that time rangeA
is simply missing and is irreversibly compacting blocks withoutA
. OnceA
after sync delay is visible again, we gotOverlap
.So we fixed it by additional condition
is new block after syncDelay || compactionLevel > 1
. This is inconsistent, because if the object store is NOT strongly consistent we can have overlaps anyway(!).With new PR for auto repairs #375, we can have compactionLevel == 1 and still having broken compaction because newly repaired block needs to avoid
syncDelay
violating "eventual consistency handling". We supports that by addingSource
meta to ThanosMeta.There are various options how to fix this though and REMOVE
syncDelay
workaround totally. Will edit this issue to describe them.The text was updated successfully, but these errors were encountered: