Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: fix potential goroutine leak in TestUpdateAddresses_RetryFromFirstAddr #5023

Conversation

charlesxsh
Copy link
Contributor

@charlesxsh charlesxsh commented Dec 1, 2021

This PR aims to fix possible goroutine leak in TestUpdateAddresses_RetryFromFirstAddr.

=== RUN TestUpdateAddresses_RetryFromFirstAddr
clientconn_test.go:xxxx: timed out waiting for server1 to be contacted
clientconn_test.go:xxxx: accept tcp 127.0.0.1:32795: use of closed network connection

goroutine 10 [chan receive]:
google.golang.org/grpc.TestUpdateAddresses_RetryFromFirstAddr.func1(0xab6d80, 0xc00000c888, 0xc0002c0a00, 0xc00002a880, 0xc00002a900)
	/fuzz/target/clientconn_test.go:1309 +0x205
created by google.golang.org/grpc.TestUpdateAddresses_RetryFromFirstAddr
	/fuzz/target/clientconn_test.go:1287 +0x60a

Any fatal happened later, like following example:

select {
	case <-server1ContactedFirstTime:
	case <-timeout:
		t.Fatal("timed out waiting for server1 to be contacted")
	}

If fatal happened, server1 goroutine will still be blocked at range stateNotifications since no close operation will be made.

for s := range stateNotifications {

RELEASE NOTES: none

@charlesxsh charlesxsh force-pushed the shihaox/TestUpdateAddresses_RetryFromFirstAddr branch from 01b6079 to 2349a53 Compare December 1, 2021 03:30
@dfawley dfawley added this to the 1.44 Release milestone Dec 6, 2021
@menghanl
Copy link
Contributor

menghanl commented Dec 7, 2021

Can you share how you made this test fail?
Is it flaky (and you just ran it many times)? Or you had to change some code to make it fail?

If I understand correctly, this leak only happens when the test fails. I would think it's OK to have a leak in a failing test.

@github-actions
Copy link

This PR is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed.

@github-actions github-actions bot added the stale label Dec 13, 2021
@charlesxsh
Copy link
Contributor Author

@menghanl This issue is found by our tool through fuzzing. In this case, it is triggered by Fatal call in the test, however, it can also be triggered by any unexpected exit. So it might be better fixed instead of letting it be there?

@menghanl
Copy link
Contributor

How does this fuzzing work? Does it change just the data or the code?
If there's a situation this test can fatal, that's a more serious problem and we should fix that first.

@charlesxsh
Copy link
Contributor Author

Oh no worry, nothing wrong with the test or business logic. It instrumented the code. It simply let us see the result in some corner case.

@dfawley dfawley assigned menghanl and unassigned charlesxsh Dec 22, 2021
@songlh
Copy link

songlh commented Jan 4, 2022

There is another problem in this unit test. If close(closeServer2) (line 990) is skipped due to panics, the goroutine created at line 915 is leaked.

@menghanl
Copy link
Contributor

menghanl commented Jan 5, 2022

I think this PR is good as is. Let's make separate PRs if there are further improvements.

@menghanl menghanl changed the title fix goroutine leak in TestUpdateAddresses_RetryFromFirstAddr test: fix potential goroutine leak in TestUpdateAddresses_RetryFromFirstAddr Jan 5, 2022
@menghanl menghanl merged commit 2fb1ac8 into grpc:master Jan 5, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 5, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants