Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: observe ctx cancellation in Run #29178

Merged
merged 1 commit into from
Aug 29, 2018

Conversation

tbg
Copy link
Member

@tbg tbg commented Aug 28, 2018

Before this patch, roachprod invocations would not observe
ctx cancellation. Or rather they would, but due to the usual
obscure passing of Stdout into the child process of roachprod
the Run call would not return until the child had finished.
As a result, the test would continue running, which is annoying
and also costs money.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@tbg
Copy link
Member Author

tbg commented Aug 28, 2018

@petermattis this is just the first commit from #29174 which doesn't need to be blocked.

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: though do you mind addressing the comment I left regarding l.stdout == l.stderr?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)

Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, missed that - done (I think).

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

@tbg
Copy link
Member Author

tbg commented Aug 28, 2018

Let's hold off here so that I can try out the "panic escalation" idea (https://play.golang.com/p/0WUEJCAr0su)

@tbg tbg force-pushed the roachtest/stdout branch 2 times, most recently from 455b164 to 01400e8 Compare August 28, 2018 19:26
@tbg tbg requested a review from a team August 28, 2018 19:26
Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@petermattis ready for another look.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Much prefer this approach. Minor question about the test you've added.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/cmd/roachtest/cluster.go, line 225 at r1 (raw file):

	debugStdoutBuffer, _ := circbuf.NewBuffer(1024)
	debugStderrBuffer, _ := circbuf.NewBuffer(1024)

Did you find this useful somewhere? Curious where it came up.


pkg/cmd/roachtest/cluster.go, line 1172 at r1 (raw file):

				// Note that the trick here is that we panicked explicitly below,
				// which somehow "overrides" the Goexit which is supposed to be
				// un-recoverable, but we do need to recover to return an error.

Oy! This is confusing. Thanks for the comment.


pkg/cmd/roachtest/test_example_adversarial.go, line 34 at r1 (raw file):

}

func testHarness(ctx context.Context, t *test, c *cluster) {

Can this be an actual test (i.e TestHarnest(t *testing.T))? Yeah, it forks sleep and echo, but that should be ok from a test.

@tbg tbg force-pushed the roachtest/stdout branch 2 times, most recently from c221bbd to 05314ad Compare August 29, 2018 10:07
Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/cmd/roachtest/cluster.go, line 225 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Did you find this useful somewhere? Curious where it came up.

I think this will save more time than anything else in this PR - before you'd get something like "command XYZ returned exit status 1" and no details of what the process output. Take a look at the commit message and note how it captures the latest part of stdout/stderr. Basically I want to be able to look at the issue and see the failure instead of clicking through 12 teamcity pages and downloading up to five artifacts. I hope that this achieves that.


pkg/cmd/roachtest/cluster.go, line 1172 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Oy! This is confusing. Thanks for the comment.

Done.


pkg/cmd/roachtest/test_example_adversarial.go, line 34 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Can this be an actual test (i.e TestHarnest(t *testing.T))? Yeah, it forks sleep and echo, but that should be ok from a test.

Done. Feels good to have decent coverage for this, but I seriously pity the next person who'll have to touch it. Probably gonna be me :-)

Before this patch, roachprod invocations would not observe
ctx cancellation. Or rather they would, but due to the usual
obscure passing of Stdout into the child process of roachprod
the Run call would not return until the child had finished.
As a result, the test would continue running, which is annoying
and also costs money.

Also fixes up the handling of calling `c.t.Fatal` on a monitor
goroutine (using what is perhaps unspecified behavior of the
Go runtime). Anyway, the result is that you can do basically
whatever inside of a monitor and get away with it:

```go
m.Go(func(ctx context.Context) error {
	// Make sure the context cancellation works (not true prior to the PR
	// adding this test).
	return c.RunE(ctx, c.Node(1), "sleep", "2000")
})
m.Go(func(ctx context.Context) error {
	// This will call c.t.Fatal which also used to wreak havoc on the test
	// harness. Now it exits just fine (and all it took were some mean hacks).
	// Note how it will exit with stderr and stdout in the failure message,
	// which is extremely helpful.
	c.Run(ctx, c.Node(1), "echo foo && echo bar && notfound")
	return errors.New("impossible")
})
m.Wait()
```

now returns

```
--- FAIL: tpmc/w=1/nodes=3 (0.24s)
...,errgroup.go:58: /Users/tschottdorf/go/bin/roachprod run local:1 -- echo foo && echo bar && notfound returned:
	stderr:
	bash: notfound: command not found
	Error:  exit status 127

	stdout:
	foo
	bar
	: exit status 1
...,tpcc.go:661: Goexit() was called
FAIL
```

Release note: None
@tbg
Copy link
Member Author

tbg commented Aug 29, 2018

bors r=petermattis

@craig
Copy link
Contributor

craig bot commented Aug 29, 2018

Build failed

@tbg
Copy link
Member Author

tbg commented Aug 29, 2018

bors r=petermattis

TestSplitAt.

craig bot pushed a commit that referenced this pull request Aug 29, 2018
29178: roachtest: observe ctx cancellation in Run r=petermattis a=tschottdorf

Before this patch, roachprod invocations would not observe
ctx cancellation. Or rather they would, but due to the usual
obscure passing of Stdout into the child process of roachprod
the Run call would not return until the child had finished.
As a result, the test would continue running, which is annoying
and also costs money.

Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
@craig
Copy link
Contributor

craig bot commented Aug 29, 2018

Build succeeded

@craig craig bot merged commit 5ba6c4c into cockroachdb:master Aug 29, 2018
tbg added a commit to tbg/cockroach that referenced this pull request Aug 30, 2018
Accidentally changed this in cockroachdb#29178.

Release note: None
tbg added a commit to tbg/cockroach that referenced this pull request Aug 30, 2018
Accidentally changed this in cockroachdb#29178.

Release note: None
craig bot pushed a commit that referenced this pull request Aug 30, 2018
29329: storage: skip TestClosedTimestampCanServe r=a-robinson a=tschottdorf

It won't be the first one I'm looking into.

Release note: None

29339: roachtest: fix flake in TestMonitor r=petermattis a=tschottdorf

Accidentally changed this in #29178.

Fixes #29325.

Release note: None

Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
tbg added a commit to tbg/cockroach that referenced this pull request Aug 30, 2018
Accidentally changed this in cockroachdb#29178.

Release note: None
craig bot pushed a commit that referenced this pull request Aug 31, 2018
29390: backport-2.1: storage, roachtest, cli: assorted backports r=petermattis a=tschottdorf

storage: remove test-only data race

   The test was stopping the engine before the stopper, so the compactor was
   sometimes able to use the engine while it was being closed.

   Fixes #29302.

   roachtest: improve TestMonitor

   Add two more test cases about the exit status of the `monitor` invocation
   itself.

   roachtest: fix flake in TestMonitor

   Accidentally changed this in #29178.

   cli: add hex option to debug keys

   This was used in #29252 and I imagine I'll want to use it again whenever we
   see the consistency checker fail in the future.

   storage: skip TestClosedTimestampCanServe

   It won't be the first one I'm looking into.


Release note: None

Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
@tbg tbg deleted the roachtest/stdout branch September 21, 2018 11:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants