Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Count KV retries not tries in Graveler #7541

Merged
merged 4 commits into from
Mar 11, 2024

Conversation

arielshaqed
Copy link
Contributor

Metric graveler_kv_retries currently counts "tries", i.e. at least one for
each try. So this includes the actual operations rate: if lakeFS performs
100 ops/sec, then it will increment graveler_kv_retries by at least 100 /
sec. That defeats using this metric to detect when KV retries impact
performance.

Instead count from 0, i.e. count "retries".

This is changelog-worthy because people may be tracking this. But given
that it's an internal metric, and that we rarely examine it, this is not a
breaking change.

Metric `graveler_kv_retries` currently counts "tries", i.e. at least one for
each try.  So this includes the actual operations rate: if lakeFS performs
100 ops/sec, then it will increment graveler_kv_retries by _at least_ 100 /
sec.  That defeats using this metric to detect when KV retries impact
performance.

Instead count from 0, i.e. count "_retries_".

This is changelog-worthy because people may be tracking this.  But given
that it's an internal metric, and that we rarely examine it, this is not a
breaking change.
Copy link

No linked issues found. Please add the corresponding issues in the pull request description.
Use GitHub automation to close the issue when a PR is merged

@arielshaqed arielshaqed added pr/merge-if-approved Reviewer: please feel free to merge if no major comments performance include-changelog PR description should be included in next release changelog area/KV Improvements to the KV store implementation minor-change Used for PRs that don't require issue attached labels Mar 10, 2024
Copy link

E2E Test Results - DynamoDB Local - Local Block Adapter

10 passed

@@ -1291,7 +1291,7 @@ func (g *Graveler) UpdateBranch(ctx context.Context, repository *RepositoryRecor
// monitorRetries reports the number of retries of operation while updating
// branchID of repository.
func (g *Graveler) monitorRetries(ctx context.Context, tries int, repositoryID RepositoryID, branchID BranchID, operation string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should also change the variable names passed to this method, i.e. change from retries to tries, or change the parameter name to retries and pass the correct value without modifying the argument value.

Copy link
Contributor

@Jonathan-Rosenberg Jonathan-Rosenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should also change the variable names passed to this method, i.e. change from retries to tries, or change the parameter name to retries and pass the correct value without modifying the argument value.

Copy link
Contributor Author

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!

Please take another look.

Copy link
Contributor

@Jonathan-Rosenberg Jonathan-Rosenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 questions

func (g *Graveler) monitorRetries(ctx context.Context, tries int, repositoryID RepositoryID, branchID BranchID, operation string) {
// branchID of repository. Parameter retries should be 0 if the operation
// succeeded on the very first attempt.
func (g *Graveler) monitorRetries(ctx context.Context, retries int, repositoryID RepositoryID, branchID BranchID, operation string) {
kvRetriesCounter.WithLabelValues(operation).Add(float64(tries))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's tries?

Comment on lines -1747 to -1750
retries := try
if retries > options.MaxTries {
retries = options.MaxTries
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you enforce this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't, I don't want to, and this is more like a mistake: it counts the number of times that we tried, rather than the number of times that we failed. If retries > options.MaxTries then we failed too many times and decided to give up. This means that we actually failed one extra time.

Technically this is the same number of retries as if the last time succeeded. But I think this count is possibly nicer. But I have no firm opinion on this, and will revert this change if you prefer.

That's not how you subtract 1.
Copy link
Contributor Author

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Dumb errors on my part in not pushing latest change.

Explained why I am willing to count options.MaxTries retries even when there were actually only options.MaxTries-1 retries. But will go with whichever version you prefer - let me know!

Comment on lines -1747 to -1750
retries := try
if retries > options.MaxTries {
retries = options.MaxTries
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't, I don't want to, and this is more like a mistake: it counts the number of times that we tried, rather than the number of times that we failed. If retries > options.MaxTries then we failed too many times and decided to give up. This means that we actually failed one extra time.

Technically this is the same number of retries as if the last time succeeded. But I think this count is possibly nicer. But I have no firm opinion on this, and will revert this change if you prefer.

l := g.log(ctx).WithFields(logging.Fields{
"tries": tries,
"tries": retries,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"tries": retries,
"retries": retries,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Contributor

@Jonathan-Rosenberg Jonathan-Rosenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last one

Copy link
Contributor Author

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Please take another (last???) look...

l := g.log(ctx).WithFields(logging.Fields{
"tries": tries,
"tries": retries,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@arielshaqed arielshaqed enabled auto-merge (squash) March 11, 2024 10:44
@arielshaqed arielshaqed merged commit 85db4b0 into master Mar 11, 2024
35 checks passed
@arielshaqed arielshaqed deleted the bug/count-retries-from-zero-not-tries-from-one branch March 11, 2024 11:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/KV Improvements to the KV store implementation include-changelog PR description should be included in next release changelog minor-change Used for PRs that don't require issue attached performance pr/merge-if-approved Reviewer: please feel free to merge if no major comments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants