-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
util: fix retry.WithMaxAttempts
context cancelled before run.
#25612
Conversation
@a-robinson Testing could be really tricky because I've run |
@windchan7 try reproducing on a gceworker ( Also, to have GH auto-close the issues, you need to have each one on its own line: Fixes #a. You can just edit the PR message and leave the commit message as is. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find!
@a-robinson Testing could be really tricky because I've run
make stress PKG=./pkg/server
locally. It's been over 200 runs but I still can't observe the seg fault.
You'd probably have a better time just stressing a test from one of the other issues, since there we at least know which test it happened in. And running on a gceworker, as @tschottdorf suggested.
pkg/util/retry/retry.go
Outdated
func WithMaxAttempts(ctx context.Context, opts Options, n int, fn func() error) error { | ||
if n <= 0 { | ||
return errors.Errorf("max attempts should not be 0 or below") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case this ever triggers, it'd likely be helpful to include the value of n
as part of the message
pkg/util/retry/retry.go
Outdated
@@ -159,6 +165,9 @@ func WithMaxAttempts(ctx context.Context, opts Options, n int, fn func() error) | |||
return nil | |||
} | |||
} | |||
if err == nil { | |||
err = errors.Errorf("function never gets run") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this error message a little clearer? For example, errors.Wrap("did not run function", ctx.Err())
If context gets cancelled right after `retry.WithMaxAttempts` runs, the function passed to it will never gets run. Now `retry.WithMaxAttempts` will at least run the function once otherwise an error will be returned. Making this change because there are places such as `show_cluster_setting.go` require the passed in function to be run at least once. Otherwise there will be seg fault. Fixes: 25600, 25603, 25570, 25567, 25566, 25511, 25485. Release note: None
LGTM but wait for Alex! Reviewed 1 of 1 files at r1, 1 of 1 files at r2. pkg/util/retry/retry.go, line 169 at r1 (raw file):
s/gets/ or better what Alex said. Comments from Reviewable |
@a-robinson Hey Alex, is it good to go? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, LGTM. Thanks!
Thanks a lot! |
bors r+ |
Build failed |
@tschottdorf In this case, I should file an issue about the test flakiness on |
bors r+ |
Normally, yes, but in this case I'll fix it up as part of #25625, since I introduced a handful of similar flakes. Thanks for commenting about it or else I wouldn't have noticed! |
Canceled (will resume) |
24883: dep: Bump grpc-go version r=a-robinson a=a-robinson And pull in new packages -- in particular, the encoding/proto package isn't needed for compilation but is needed at runtime. Release note: None --------------------- We should wait to merge this until they've cut the 1.12 release tag so that we aren't just at an arbitrary commit in their git history but I'm sending out the PR now so that I (or whoever would have done this) don't have to deal with debugging the missing encoding/proto package when it comes time to merge this. As tested in #17370 (comment), this gives a 5-10% boost in whole-cluster throughput and improved tail latencies when run with a highly concurrent workload. It appears to have little performance effect for lower-concurrency workloads. 25410: sql: run schema changes after CREATE TABLE in txn r=vivekmenezes a=vivekmenezes Includes a commit from #25362 and should be reviewed after that change. 25612: util: fix `retry.WithMaxAttempts` context cancelled before run. r=windchan7 a=windchan7 If context gets cancelled right after `retry.WithMaxAttempts` runs, the function passed to it will never gets run. Now `retry.WithMaxAttempts` will at least run the function once otherwise an error will be returned. Making this change because there are places such as `show_cluster_setting.go` require the passed in function to be run at least once. Otherwise there will be seg fault. Fixes: #25600. Fixes: #25603. Fixes: #25570. Fixes: #25567. Fixes: #25566. Fixes: #25511. Fixes: #25485. Release note: None 25625: storage: Adding testing knob to disable automatic lease renewals r=a-robinson a=a-robinson In order to fix the test flakes caused by automatic lease renewals Fixes #25537 Fixes #25540 Fixes #25568 Fixes #25573 Fixes #25576 Fixes #25589 Fixes #25594 Fixes #25599 Fixes #25605 Fixes #25620 Release note: None Co-authored-by: Alex Robinson <alexdwanerobinson@gmail.com> Co-authored-by: Vivek Menezes <vivek@cockroachlabs.com> Co-authored-by: Victor Chen <victor@cockroachlabs.com>
👍 |
I think this should be backported to 2.0 as well, given that it fixes a segfault. |
Build succeeded |
If context gets cancelled right after
retry.WithMaxAttempts
runs, the functionpassed to it will never gets run. Now
retry.WithMaxAttempts
will at least runthe function once otherwise an error will be returned.
Making this change because there are places such as
show_cluster_setting.go
require the passed in function to be run at least once. Otherwise there will
be seg fault.
Fixes: #25600.
Fixes: #25603.
Fixes: #25570.
Fixes: #25567.
Fixes: #25566.
Fixes: #25511.
Fixes: #25485.
Release note: None