-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Count KV retries not tries in Graveler #7541
Count KV retries not tries in Graveler #7541
Conversation
Metric `graveler_kv_retries` currently counts "tries", i.e. at least one for each try. So this includes the actual operations rate: if lakeFS performs 100 ops/sec, then it will increment graveler_kv_retries by _at least_ 100 / sec. That defeats using this metric to detect when KV retries impact performance. Instead count from 0, i.e. count "_retries_". This is changelog-worthy because people may be tracking this. But given that it's an internal metric, and that we rarely examine it, this is not a breaking change.
No linked issues found. Please add the corresponding issues in the pull request description. |
pkg/graveler/graveler.go
Outdated
@@ -1291,7 +1291,7 @@ func (g *Graveler) UpdateBranch(ctx context.Context, repository *RepositoryRecor | |||
// monitorRetries reports the number of retries of operation while updating | |||
// branchID of repository. | |||
func (g *Graveler) monitorRetries(ctx context.Context, tries int, repositoryID RepositoryID, branchID BranchID, operation string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should also change the variable names passed to this method, i.e. change from retries
to tries
, or change the parameter name to retries and pass the correct value without modifying the argument value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should also change the variable names passed to this method, i.e. change from retries
to tries
, or change the parameter name to retries and pass the correct value without modifying the argument value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea!
Please take another look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 questions
pkg/graveler/graveler.go
Outdated
func (g *Graveler) monitorRetries(ctx context.Context, tries int, repositoryID RepositoryID, branchID BranchID, operation string) { | ||
// branchID of repository. Parameter retries should be 0 if the operation | ||
// succeeded on the very first attempt. | ||
func (g *Graveler) monitorRetries(ctx context.Context, retries int, repositoryID RepositoryID, branchID BranchID, operation string) { | ||
kvRetriesCounter.WithLabelValues(operation).Add(float64(tries)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's tries?
retries := try | ||
if retries > options.MaxTries { | ||
retries = options.MaxTries | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you enforce this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't, I don't want to, and this is more like a mistake: it counts the number of times that we tried, rather than the number of times that we failed. If retries > options.MaxTries
then we failed too many times and decided to give up. This means that we actually failed one extra time.
Technically this is the same number of retries as if the last time succeeded. But I think this count is possibly nicer. But I have no firm opinion on this, and will revert this change if you prefer.
That's not how you subtract 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Dumb errors on my part in not pushing latest change.
Explained why I am willing to count options.MaxTries retries even when there were actually only options.MaxTries-1 retries. But will go with whichever version you prefer - let me know!
retries := try | ||
if retries > options.MaxTries { | ||
retries = options.MaxTries | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't, I don't want to, and this is more like a mistake: it counts the number of times that we tried, rather than the number of times that we failed. If retries > options.MaxTries
then we failed too many times and decided to give up. This means that we actually failed one extra time.
Technically this is the same number of retries as if the last time succeeded. But I think this count is possibly nicer. But I have no firm opinion on this, and will revert this change if you prefer.
pkg/graveler/graveler.go
Outdated
l := g.log(ctx).WithFields(logging.Fields{ | ||
"tries": tries, | ||
"tries": retries, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"tries": retries, | |
"retries": retries, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
last one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Please take another (last???) look...
pkg/graveler/graveler.go
Outdated
l := g.log(ctx).WithFields(logging.Fields{ | ||
"tries": tries, | ||
"tries": retries, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
Metric
graveler_kv_retries
currently counts "tries", i.e. at least one foreach try. So this includes the actual operations rate: if lakeFS performs
100 ops/sec, then it will increment graveler_kv_retries by at least 100 /
sec. That defeats using this metric to detect when KV retries impact
performance.
Instead count from 0, i.e. count "retries".
This is changelog-worthy because people may be tracking this. But given
that it's an internal metric, and that we rarely examine it, this is not a
breaking change.