-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle (ignore) partial uploads on downsampling and apply retention cycles #1335
Comments
It fixes compactor crashes like
This is because when doing sync and sidecar uploading block we might hit block with no meta.json. |
Hey @bwplotka I can help with this. |
oh yes please @lx223 Essentially we have |
Any progress @lx223 ? |
sorry for the delay; have been busy with work. should have something up for review soon. |
PR: #1394 PTAL |
Still valid I think, ref: https://cloud-native.slack.com/archives/CK5RSSC10/p1572453579422500 |
It's getting quite important to get this in, another point of failure: |
#1394 was our first attempt and can be used as a base (: Will get into this in free time, unless there are other volunteers (: |
This replaces man 4 inconsistent meta.json syncs places in other components. Fixes: #1335 Fixes: #1919 Fixes: #1300 * One place for sync logic for both compactor and store * Corrupted disk cache for meta.json is handled gracefully. * Blocks without meta.json are handled properly for all compactor phases. * Synchronize was not taking into account deletion by removing meta.json. * Prepare for future implementation of https://thanos.io/proposals/201901-read-write-operations-bucket.md/ * Better observability for syncronize process. * More logs for store startup process. * Remove Compactor Syncer. * Added metric for partialUploadAttempt deletions. * More tests. TODO in separate PR: * More observability for index-cache loading / adding time. Signed-off-by: Bartek Plotka <bwplotka@gmail.com> Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
This replaces man 4 inconsistent meta.json syncs places in other components. Fixes: #1335 Fixes: #1919 Fixes: #1300 * One place for sync logic for both compactor and store * Corrupted disk cache for meta.json is handled gracefully. * Blocks without meta.json are handled properly for all compactor phases. * Synchronize was not taking into account deletion by removing meta.json. * Prepare for future implementation of https://thanos.io/proposals/201901-read-write-operations-bucket.md/ * Better observability for syncronize process. * More logs for store startup process. * Remove Compactor Syncer. * Added metric for partialUploadAttempt deletions. * More tests. TODO in separate PR: * More observability for index-cache loading / adding time. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Fixes: #1335 Fixes: #1919 Fixes: #1300 * Clean up of meta files are now started only if block which is being uploaded is older than 2 days (only a mitigation). * Blocks without meta.json are handled properly for all compactor phases. * Prepare for future implementation of https://thanos.io/proposals/201901-read-write-operations-bucket.md/ * Added metric for partialUploadAttempt deletions and delayed it. * More tests. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Fixes: #1335 Fixes: #1919 Fixes: #1300 * Clean up of meta files are now started only if block which is being uploaded is older than 2 days (only a mitigation). * Blocks without meta.json are handled properly for all compactor phases. * Prepare for future implementation of https://thanos.io/proposals/201901-read-write-operations-bucket.md/ * Added metric for partialUploadAttempt deletions and delayed it. * More tests. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Fixes: #1335 Fixes: #1919 Fixes: #1300 * Clean up of meta files are now started only if block which is being uploaded is older than 2 days (only a mitigation). * Blocks without meta.json are handled properly for all compactor phases. * Prepare for future implementation of https://thanos.io/proposals/201901-read-write-operations-bucket.md/ * Added metric for partialUploadAttempt deletions and delayed it. * More tests. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Fixes: #1335 Fixes: #1919 Fixes: #1300 * Clean up of meta files are now started only if block which is being uploaded is older than 2 days (only a mitigation). * Blocks without meta.json are handled properly for all compactor phases. * Prepare for future implementation of https://thanos.io/proposals/201901-read-write-operations-bucket.md/ * Added metric for partialUploadAttempt deletions and delayed it. * More tests. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Fixes: #1335 Fixes: #1919 Fixes: #1300 * Clean up of meta files are now started only if block which is being uploaded is older than 2 days (only a mitigation). * Blocks without meta.json are handled properly for all compactor phases. * Prepare for future implementation of https://thanos.io/proposals/201901-read-write-operations-bucket.md/ * Added metric for partialUploadAttempt deletions and delayed it. * More tests. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Fixes: #1335 Fixes: #1919 Fixes: #1300 * Clean up of meta files are now started only if block which is being uploaded is older than 2 days (only a mitigation). * Blocks without meta.json are handled properly for all compactor phases. * Prepare for future implementation of https://thanos.io/proposals/201901-read-write-operations-bucket.md/ * Added metric for partialUploadAttempt deletions and delayed it. * More tests. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Fixes: #1335 Fixes: #1919 Fixes: #1300 * Clean up of meta files are now started only if block which is being uploaded is older than 2 days (only a mitigation). * Blocks without meta.json are handled properly for all compactor phases. * Prepare for future implementation of https://thanos.io/proposals/201901-read-write-operations-bucket.md/ * Added metric for partialUploadAttempt deletions and delayed it. * More tests. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
AC:
We have nice concurrent logic here: https://github.com/improbable-eng/thanos/blob/10f84a7d7f06813fda97e17d609a9969fcda037f/pkg/compact/compact.go#L169
We did not apply it here: https://github.com/improbable-eng/thanos/blob/10f84a7d7f06813fda97e17d609a9969fcda037f/cmd/thanos/downsample.go#L145
And not on retention: https://github.com/improbable-eng/thanos/blob/1d582af96b2cd412ade46dcfa07e6d40ffb3c176/pkg/compact/retention.go#L16
The text was updated successfully, but these errors were encountered: