tests: avoid t.Parallel() when the top test has defers #37072

knz · 2019-04-24T10:19:27Z

Sub-tests that invoke t.Parallel() get to run concurrently with
their parent test, and may be delayed arbitrarily past beyond the end
of the termination of the parent test (including beyond the execution
of its defer calls).

This means it's unsafe to call t.Parallel() with
e.g. leaktest.AfterTest().

Release note: None

cockroach-teamcity · 2019-04-24T10:19:35Z

This change is

tbg

Reviewed 2 of 2 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @RaduBerinde)

pkg/rpc/context_test.go, line 803 at r1 (raw file):

		pName := fmt.Sprintf("client%sserver,server%sclient", connIcon(c.partitionC2S), connIcon(c.partitionS2C))
		t.Run(kaName+"/"+pName, func(t *testing.T) {
			// TODO(knz): t.Parallel disabled until leaktest.AfterTest learns

Hmm, that's a bummer. Test takes 28s (up from <4s with enough parallelism). I think here I'd rip out the subtests and start them as goroutines with a waitgroup, writing into an error channel. That way we get even better parallelism than with t.Parallel and the main test will wait. Or, if we decide that sometimes use of t.Parallel is ok and we don't want to lint it away, we can keep using it and we'll just add the waitgroup to the test.

var wg sync.WaitGroup
wg.Add(len(testCases))
errCh := make(chan error, len(testCases))
for _, c := range testCases {
  go func(c testCase) {
    errCh <- errors.Wrapf(runTestCase(c), "%+v", c)
  }(c)
}
wg.Wait()
close(errCh)

for err := range errCh {
  t.Errorf("%+v", err)
}

pkg/sql/physical_props_test.go, line 363 at r1 (raw file):

				// how to wait for all sub-tests to complete.
				//
				// t.Parallel()

This one doesn't change the duration of the test (I verified), so I'd just nuke this.

pkg/sql/physical_props_test.go, line 572 at r1 (raw file):

			// how to wait for all sub-tests to complete.
			//
			//			t.Parallel()

Verified that this can be nuked as well.

pkg/sql/physical_props_test.go, line 769 at r1 (raw file):

		tc := testCases[i]
		t.Run(tc.name, func(t *testing.T) {
			// TODO(knz): t.Parallel disabled until leaktest.AfterTest learns

Not tested, but pretty sure all the tests in this file don't need t.Parallel.

RaduBerinde

Interesting. Seems like there is a missing API in test that we should file an issue for - either a function to register a cleanup to happen after the parent waits for child subtests, or a function to wait for all subtests.

We could use a wait group though, replacing every t.Parallel() call with the block wg.Add(1); defer wg.Done(); t.Parallel()

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @RaduBerinde)

RaduBerinde

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @tbg)

pkg/rpc/context_test.go, line 803 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Hmm, that's a bummer. Test takes 28s (up from <4s with enough parallelism). I think here I'd rip out the subtests and start them as goroutines with a waitgroup, writing into an error channel. That way we get even better parallelism than with t.Parallel and the main test will wait. Or, if we decide that sometimes use of t.Parallel is ok and we don't want to lint it away, we can keep using it and we'll just add the waitgroup to the test.
var wg sync.WaitGroup
wg.Add(len(testCases))
errCh := make(chan error, len(testCases))
for _, c := range testCases {
  go func(c testCase) {
    errCh <- errors.Wrapf(runTestCase(c), "%+v", c)
  }(c)
}
wg.Wait()
close(errCh)

for err := range errCh {
  t.Errorf("%+v", err)
}
</blockquote></details>

Ah, I made the same suggestion. Why do we need the error channel though? The test infrastructure should still work, we just need the `Wait()`.

tbg

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @RaduBerinde)

pkg/rpc/context_test.go, line 803 at r1 (raw file):

Previously, RaduBerinde wrote…

Ah, I made the same suggestion. Why do we need the error channel though? The test infrastructure should still work, we just need the Wait().

Using testing.T on a goroutine is not supported by the testing framework, and will cause problems. In particular, t.Fatal off the test goroutine will result in an uncaught runtime.Goexit. Note how runTestCase doesn't even get to use the t. That's to avoid this problem in the first place. Using t.Error is fine btw, but that's not a pattern I want to put out there.

RaduBerinde · 2019-04-24T10:59:21Z

pkg/rpc/context_test.go, line 803 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Using testing.T on a goroutine is not supported by the testing framework, and will cause problems. In particular, t.Fatal off the test goroutine will result in an uncaught runtime.Goexit. Note how runTestCase doesn't even get to use the t. That's to avoid this problem in the first place. Using t.Error is fine btw, but that's not a pattern I want to put out there.

But you don't need a goroutine. Just call Run like before and put defer wg.Done() inside that. Run will return when the inner test calls Parallel (it has to, or the parent can't continue to call more Runs).

RaduBerinde · 2019-04-24T11:27:26Z

Raphael points out that the subtests block inside Parallel() until the parent test completes, so my suggestion won't work. I filed golang/go#31651.

knz

RFAL

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @tbg)

pkg/rpc/context_test.go, line 803 at r1 (raw file):

Previously, RaduBerinde wrote…

But you don't need a goroutine. Just call Run like before and put wg.Done() inside that. Run will return when the inner test calls Parallel (it has to, or the parent can't continue to call more Runs).

Modified to use Tobias' suggestion.

pkg/sql/physical_props_test.go, line 363 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

This one doesn't change the duration of the test (I verified), so I'd just nuke this.

Done.

pkg/sql/physical_props_test.go, line 572 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Verified that this can be nuked as well.

Done.

pkg/sql/physical_props_test.go, line 769 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Not tested, but pretty sure all the tests in this file don't need t.Parallel.

Done.

tbg

optional suggestion: add a lint against t.Parallel following the example here (ironic that there would be a call to t.Parallel() in that same snippet):

cockroach/pkg/testutils/lint/lint_test.go

Lines 715 to 726 in e3cbf33

    
           t.Parallel() 
        
           cmd, stderr, filter, err := dirCmd( 
        
           	pkgDir, 
        
           	"git", 
        
           	"grep", 
        
           	"-nE", 
        
           	`\.Clone\([^)]`, 
        
           	"--", 
        
           	"*.go", 
        
           	":!util/protoutil/clone_test.go", 
        
           	":!util/protoutil/clone.go", 
        
           )

Reviewed 2 of 2 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @knz and @RaduBerinde)

pkg/rpc/context_test.go, line 803 at r1 (raw file):

Previously, knz (kena) wrote…

Modified to use Tobias' suggestion.

Optional nit: move this out of the test so that it doesn't have access to a *testing.T, or do t := (*testing.T)(nil) here to reliably catch any accidental calls in the method.

knz

optional suggestion: add a lint against t.Parallel

Good idea. Done. Will merge this PR once #36952 has merged, as this will remove the one remaining call to t.Parallel().

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @tbg)

pkg/rpc/context_test.go, line 803 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Optional nit: move this out of the test so that it doesn't have access to a *testing.T, or do t := (*testing.T)(nil) here to reliably catch any accidental calls in the method.

Done. This was a good call -- there was a remaining use of t inside the function.

tbg

Reviewed 1 of 1 files at r3, 4 of 4 files at r4.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @knz)

pkg/rpc/context_test.go, line 803 at r1 (raw file):

Previously, knz (kena) wrote…

Done. This was a good call -- there was a remaining use of t inside the function.

🐛 🔨

pkg/testutils/lint/lint_test.go, line 775 at r4 (raw file):

			stream.GrepNot(`// SAFE FOR TESTING`),
		), func(s string) {
			t.Errorf("\n%s <- forbidden, use a sync.WaitGroup instead", s)

Refer to golang/go#31651 in the message to satisfy the curious.

knz

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @tbg)

pkg/testutils/lint/lint_test.go, line 775 at r4 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Refer to golang/go#31651 in the message to satisfy the curious.

Done.

tbg

Reviewed 1 of 1 files at r5.
Reviewable status: complete! 1 of 0 LGTMs obtained

Sub-tests that invoke `t.Parallel()` get to run concurrently with their parent test, and may be delayed arbitrarily past beyond the end of the termination of the parent test (including beyond the execution of its `defer` calls). This means it's unsafe to call `t.Parallel()` with e.g. `leaktest.AfterTest()`. This patch removes `t.Parallel` from the test in `sql/physical_props_test.go` and modifies `pkg/rpc/TestGRPCKeepaliveFailureFailsInflightRPCs` to use a wait group instead. Release note: None

…ightRPCs Since this now runs on its own goroutine, it is invalid for it to access the surrounding `testing.T`. This patch moves the code so that `t` is clearly out of scope. Release note: None

This suggests using `sync.WaitGroup` instead. Release note: None

knz · 2019-04-24T13:58:56Z

bors r=tbg,RaduBerinde

37072: tests: avoid t.Parallel() when the top test has defers r=tbg,RaduBerinde a=knz Sub-tests that invoke `t.Parallel()` get to run concurrently with their parent test, and may be delayed arbitrarily past beyond the end of the termination of the parent test (including beyond the execution of its `defer` calls). This means it's unsafe to call `t.Parallel()` with e.g. `leaktest.AfterTest()`. Release note: None Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>

craig · 2019-04-24T14:17:17Z

Build succeeded

GitHub CI (Cockroach)

knz requested review from a team, RaduBerinde and tbg April 24, 2019 10:19

tbg reviewed Apr 24, 2019

View reviewed changes

RaduBerinde approved these changes Apr 24, 2019

View reviewed changes

tbg reviewed Apr 24, 2019

View reviewed changes

knz force-pushed the 20190424-parallel branch from ce21a47 to 9ab7189 Compare April 24, 2019 11:34

knz commented Apr 24, 2019

View reviewed changes

tbg approved these changes Apr 24, 2019

View reviewed changes

knz requested a review from a team April 24, 2019 13:07

knz commented Apr 24, 2019

View reviewed changes

knz force-pushed the 20190424-parallel branch from bebcaa3 to cb5f1d6 Compare April 24, 2019 13:09

tbg approved these changes Apr 24, 2019

View reviewed changes

knz force-pushed the 20190424-parallel branch from cb5f1d6 to 0f1a488 Compare April 24, 2019 13:29

knz commented Apr 24, 2019

View reviewed changes

tbg approved these changes Apr 24, 2019

View reviewed changes

knz added 3 commits April 24, 2019 15:53

pkg/rpc: extract the test code from TestGRPCKeepaliveFailureFailsInfl…

9c4fcee

…ightRPCs Since this now runs on its own goroutine, it is invalid for it to access the surrounding `testing.T`. This patch moves the code so that `t` is clearly out of scope. Release note: None

lint: add a linter to forbid uses of t.Parallel()

6f30625

This suggests using `sync.WaitGroup` instead. Release note: None

knz force-pushed the 20190424-parallel branch from 0f1a488 to 6f30625 Compare April 24, 2019 13:54

knz mentioned this pull request Apr 24, 2019

release-19.1: de-flake tests #37076

Merged

craig bot merged commit 6f30625 into cockroachdb:master Apr 24, 2019

	t.Parallel()
	cmd, stderr, filter, err := dirCmd(
	pkgDir,
	"git",
	"grep",
	"-nE",
	`\.Clone\([^)]`,
	"--",
	"*.go",
	":!util/protoutil/clone_test.go",
	":!util/protoutil/clone.go",
	)

tests: avoid t.Parallel() when the top test has defers #37072

tests: avoid t.Parallel() when the top test has defers #37072

Uh oh!

Conversation

knz commented Apr 24, 2019

Uh oh!

cockroach-teamcity commented Apr 24, 2019

Uh oh!

tbg left a comment

Choose a reason for hiding this comment

Uh oh!

RaduBerinde left a comment

Choose a reason for hiding this comment

Uh oh!

RaduBerinde left a comment

Choose a reason for hiding this comment

Uh oh!

tbg left a comment

Choose a reason for hiding this comment

Uh oh!

RaduBerinde commented Apr 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RaduBerinde commented Apr 24, 2019

Uh oh!

knz left a comment

Choose a reason for hiding this comment

Uh oh!

tbg left a comment

Choose a reason for hiding this comment

Uh oh!

knz left a comment

Choose a reason for hiding this comment

Uh oh!

tbg left a comment

Choose a reason for hiding this comment

Uh oh!

knz left a comment

Choose a reason for hiding this comment

Uh oh!

tbg left a comment

Choose a reason for hiding this comment

Uh oh!

knz commented Apr 24, 2019

Uh oh!

craig bot commented Apr 24, 2019

Build succeeded

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

RaduBerinde commented Apr 24, 2019 •

edited

Loading