-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
downsample: retry objstore related errors #7194
Conversation
Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>
Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>
@@ -419,7 +420,7 @@ func processDownsampling( | |||
|
|||
err = block.Upload(ctx, logger, bkt, resdir, hashFunc) | |||
if err != nil { | |||
return errors.Wrapf(err, "upload downsampled block %s", id) | |||
return compact.NewRetryError(errors.Wrapf(err, "upload downsampled block %s", id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From scanning the code, I dont think this is consumed somewhere right now, right? I dont think this will lead to retries currently!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand it
processDownsampling
returns error to
downsampleBucket
which returns error to compactMainFn
in cmd/thanos/compact.go
then in cmd/thanos/compact.go
there's a if compact.IsRetryError(err) {
check
which should trigger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assuming we run compactor with --wait
flag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's better to retry right here instead of returning an error. This way the compactor will not have to go through the whole cycle again and downsample the block from the beginning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought to make it less intrusive, retry like compaction process is retried.
I can retry upload/download calls. But it would be good to make the same logic in compaction - retry upload/download calls. And, maybe, expose a parameter --objstore.file-retries
Maximum number of retries for fetch/upload block files from object storage.
What do you say, @fpetkovski ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, my bad!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xBazilio sounds good, we can keep things consistent for now. Nothing is set in stone anyway, so we can improve if needed.
What are the next steps? Can it be merged? |
Changes
Verification