-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.*: Don't report "object not found" as error for Get/GetRange #2365
.*: Don't report "object not found" as error for Get/GetRange #2365
Conversation
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
I believe it's unrelated to my change. |
Indeed it seems unrelated. Reran the CI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO it would be cleaner to do this in deletionmark.go
itself because some code might actually try to Get
something that the code assumes that some object exists, and that should probably be reflected in the metrics. For example, the user could still run into consistency issues even with all of these safeguards in place but with this if
it would be impossible to detect such a problem easily.
I've created #2369 that implements alternative fix in |
From my perspective, bucket level metrics should track a failure only when a bucket-related failure occurs. If the object does not exist, it's not a bucket-level error (ie. networking error, storage unavailable, etc...). I personally think this PR is better than #2369. |
Looking. NotFound is a problem indeed - it's most of the time user error, but for us we used to treat it as a server error if it's unexpected. |
One way of doing it is to always use |
#2369 :) |
or actually you did that version too #2369 Will reach you offline |
I think I like #2369 more, because we will be still notified for cases that not found is unexpected... Other option is to extend the API a bit and add something like |
Slack discussion: https://cloud-native.slack.com/archives/CL25937SP/p1585903230146500 You are right, but we still need to have some metric if this happens in unexpectedly. Fourth alternative is another metric (: But this will be inconsistent with our bucket Lvl metric because we don't separate user-level errors at the moment. |
Alternative we agrred offline for: #2370 Let me know if that's make sense 🤗 |
#2370 seems to fix the problem as well. Thanks! |
Thanks to you for starting discussion and ideas 👍 |
Thanks Bartek! |
Changes
Don't treat "object not found" as failures when reporting metrics for Get/GetRange operations.
Recently introduced
IgnoreDeletionMarkFilter
uses Get operation to check if block is marked for deletion, and since many blocks are not, bucket store returns "object not found" error. SincemetricBucket
reports this as failed operation, it looks as if there were many failed operations. But this is expected outcome for Get/GetRange, so I don't think it should be reported as failure.Alternatively, we can modify
IgnoreDeletionMarkFilter
to check for file first, but that would require two operations.Verification
See decreased number of reported failures
thanos_objstore_bucket_operation_failures_total
forGet
operation.