Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

util: fix retry.WithMaxAttempts context cancelled before run. #25612

Merged
merged 1 commit into from
May 17, 2018

Conversation

windchan7
Copy link
Contributor

@windchan7 windchan7 commented May 17, 2018

If context gets cancelled right after retry.WithMaxAttempts runs, the function
passed to it will never gets run. Now retry.WithMaxAttempts will at least run
the function once otherwise an error will be returned.

Making this change because there are places such as show_cluster_setting.go
require the passed in function to be run at least once. Otherwise there will
be seg fault.

Fixes: #25600.
Fixes: #25603.
Fixes: #25570.
Fixes: #25567.
Fixes: #25566.
Fixes: #25511.
Fixes: #25485.
Release note: None

@windchan7 windchan7 requested a review from a-robinson May 17, 2018 14:43
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@windchan7
Copy link
Contributor Author

@a-robinson Testing could be really tricky because I've run make stress PKG=./pkg/server locally. It's been over 200 runs but I still can't observe the seg fault.

@tbg
Copy link
Member

tbg commented May 17, 2018

@windchan7 try reproducing on a gceworker (./scripts/gceworker.sh create; ./scripts/gceworker.sh start) and if that doesn't work, try with `STRESSFLAGS='-p 100')

Also, to have GH auto-close the issues, you need to have each one on its own line:

Fixes #a.
Fixes #b.
...

You can just edit the PR message and leave the commit message as is.

Copy link
Contributor

@a-robinson a-robinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find!

@a-robinson Testing could be really tricky because I've run make stress PKG=./pkg/server locally. It's been over 200 runs but I still can't observe the seg fault.

You'd probably have a better time just stressing a test from one of the other issues, since there we at least know which test it happened in. And running on a gceworker, as @tschottdorf suggested.

func WithMaxAttempts(ctx context.Context, opts Options, n int, fn func() error) error {
if n <= 0 {
return errors.Errorf("max attempts should not be 0 or below")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case this ever triggers, it'd likely be helpful to include the value of n as part of the message

@@ -159,6 +165,9 @@ func WithMaxAttempts(ctx context.Context, opts Options, n int, fn func() error)
return nil
}
}
if err == nil {
err = errors.Errorf("function never gets run")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this error message a little clearer? For example, errors.Wrap("did not run function", ctx.Err())

If context gets cancelled right after `retry.WithMaxAttempts` runs, the function
passed to it will never gets run. Now `retry.WithMaxAttempts` will at least run
the function once otherwise an error will be returned.

Making this change because there are places such as `show_cluster_setting.go`
require the passed in function to be run at least once. Otherwise there will
be seg fault.

Fixes: 25600, 25603, 25570, 25567, 25566, 25511, 25485.
Release note: None
@tbg
Copy link
Member

tbg commented May 17, 2018

LGTM but wait for Alex!


Reviewed 1 of 1 files at r1, 1 of 1 files at r2.
Review status: all files reviewed at latest revision, 2 unresolved discussions.


pkg/util/retry/retry.go, line 169 at r1 (raw file):

	}
	if err == nil {
		err = errors.Errorf("function never gets run")

s/gets/

or better what Alex said.


Comments from Reviewable

@windchan7
Copy link
Contributor Author

@a-robinson Hey Alex, is it good to go?

Copy link
Contributor

@a-robinson a-robinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, LGTM. Thanks!

@windchan7
Copy link
Contributor Author

Thanks a lot!

@windchan7
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented May 17, 2018

Build failed

@windchan7
Copy link
Contributor Author

@tschottdorf In this case, I should file an issue about the test flakiness on TestApplyCmdLeaseError , right?

@windchan7
Copy link
Contributor Author

bors r+

@a-robinson
Copy link
Contributor

@tschottdorf In this case, I should file an issue about the test flakiness on TestApplyCmdLeaseError, right?

Normally, yes, but in this case I'll fix it up as part of #25625, since I introduced a handful of similar flakes. Thanks for commenting about it or else I wouldn't have noticed!

@craig
Copy link
Contributor

craig bot commented May 17, 2018

Canceled (will resume)

craig bot pushed a commit that referenced this pull request May 17, 2018
24883: dep: Bump grpc-go version r=a-robinson a=a-robinson

And pull in new packages -- in particular, the encoding/proto package
isn't needed for compilation but is needed at runtime.

Release note: None

---------------------

We should wait to merge this until they've cut the 1.12 release tag so that we aren't just at an arbitrary commit in their git history but I'm sending out the PR now so that I (or whoever would have done this) don't have to deal with debugging the missing encoding/proto package when it comes time to merge this.

As tested in #17370 (comment), this gives a 5-10% boost in whole-cluster throughput and improved tail latencies when run with a highly concurrent workload. It appears to have little performance effect for lower-concurrency workloads.

25410:  sql: run schema changes after CREATE TABLE in txn r=vivekmenezes a=vivekmenezes

Includes a commit from #25362 and should be reviewed after that change.

25612: util: fix `retry.WithMaxAttempts` context cancelled before run. r=windchan7 a=windchan7

If context gets cancelled right after `retry.WithMaxAttempts` runs, the function
passed to it will never gets run. Now `retry.WithMaxAttempts` will at least run
the function once otherwise an error will be returned.

Making this change because there are places such as `show_cluster_setting.go`
require the passed in function to be run at least once. Otherwise there will
be seg fault.

Fixes: #25600. 
Fixes: #25603.
Fixes: #25570. 
Fixes: #25567.
Fixes: #25566.
Fixes: #25511.
Fixes: #25485.
Release note: None

25625: storage: Adding testing knob to disable automatic lease renewals r=a-robinson a=a-robinson

In order to fix the test flakes caused by automatic lease renewals

Fixes #25537
Fixes #25540
Fixes #25568
Fixes #25573
Fixes #25576
Fixes #25589
Fixes #25594
Fixes #25599
Fixes #25605
Fixes #25620

Release note: None

Co-authored-by: Alex Robinson <alexdwanerobinson@gmail.com>
Co-authored-by: Vivek Menezes <vivek@cockroachlabs.com>
Co-authored-by: Victor Chen <victor@cockroachlabs.com>
@windchan7
Copy link
Contributor Author

👍

@a-robinson
Copy link
Contributor

I think this should be backported to 2.0 as well, given that it fixes a segfault.

@craig
Copy link
Contributor

craig bot commented May 17, 2018

Build succeeded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants