Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tools bucket verify must validate chunks #4881

Closed
aymericDD opened this issue Nov 19, 2021 · 6 comments
Closed

Tools bucket verify must validate chunks #4881

aymericDD opened this issue Nov 19, 2021 · 6 comments

Comments

@aymericDD
Copy link
Contributor

aymericDD commented Nov 19, 2021

Is your proposal related to a problem?

After an S3 outage, some data has been corrupted or removed and caused compactor and store errors because some chunk files was missing.

Error log :

level=error ts=2021-11-15T09:15:25.252549999Z caller=main.go:157 err="group 0@15489514322847869503: compact blocks [/thanos/compact/0@15489514322847869503/01FK83WSKRS741JPZSH3ANQMM2 /thanos/compact/0@15489514322847869503/01FKD8JX9S4ZGTJSZ4EDMGKPD8 /thanos/compact/0@15489514322847869503/01FKJDE2TC0PEZM5PEMB31AWZS /thanos/compact/0@15489514322847869503/01FKQJ7TNQEKSBNXWBHPMTG6AC /thanos/compact/0@15489514322847869503/01FKWQ2R1AB4GJ1WRSDF2TM7QM /thanos/compact/0@15489514322847869503/01FM1VV3XXPB2DB305W3WCTAZN /thanos/compact/0@15489514322847869503/01FM7133CXJBKXTC0JJDZNWE53]: populate block: chunk iter: cannot populate chunk 17179869528: checksum mismatch expected:58b36271, actual:95f3491d\ncompaction\nmain.runCompact.func7\n\t/app/cmd/thanos/compact.go:416\nmain.runCompact.func8\n\t/app/cmd/thanos/compact.go:465\ngithub.com/oklog/run.(*Group).Run.func1\n\t/go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371\ncompact command failed\nmain.main\n\t/app/cmd/thanos/main.go:157\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:225\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371"

Describe the solution you'd like

The S3 datacenter has been fixed, but I need to check if all blocks of buckets are valid. To validate all my blocks I used the command thanos tools bucket verify but it just validates the index. A missing chunk or a corrupted chunk is not detected. The thanos tools bucket verify must validate the hash of the chunk file. The Prometheus TSDB already do that https://github.com/prometheus/prometheus/blob/main/tsdb/chunks/chunks.go#L549 and the promtool cli to https://github.com/prometheus/prometheus/blob/main/cmd/promtool/tsdb.go#L602 but we need to download all the chunks and the index file and It could be very expensive and slow on large blocks (size++, duration++). I like your proposal @bwplotka #1787 to store the hash of the chunks into metadata file. Like that we can just request to the remote storage the metadata of the chunk to retrieve the hash and compare it. With this solution we don't have to download chunks files. Maybe this implementation could be done at this line https://github.com/thanos-io/thanos/blob/main/pkg/block/index.go#L319 ?

Describe alternatives you've considered

  • Wait for error from the store or compactor, but It will affect end users.
  • Download all blocks (chunks, index + metadata) and use promtool cli to validate blocks but it is very expensive...

Additional context

RAS

@yeya24
Copy link
Contributor

yeya24 commented Nov 19, 2021

#3031 This pr already supports storing hashes in the metadata file.
But it is for each file, not for each chunk.

It is also possible to read each chunk data on the fly from the object store.

@aymericDD
Copy link
Contributor Author

Thanks for your reply @yeya24 👍
So if this feature is enabled we can check if chunks files are correct. I will take a look when I have the time.

@stale
Copy link

stale bot commented Mar 2, 2022

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Mar 2, 2022
@stale
Copy link

stale bot commented Apr 17, 2022

Closing for now as promised, let us know if you need this to be reopened! 🤗

@stale stale bot closed this as completed Apr 17, 2022
@aymericDD
Copy link
Contributor Author

Still needed

@jimethn
Copy link

jimethn commented Aug 12, 2022

Still needed. We don't have a good way to fix (or at least identify) corrupted chunks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants