-
Notifications
You must be signed in to change notification settings - Fork 110
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! I missed that multiple tests running after each other can cause an issue, too. As the previous test's goroutine did not stop yet.
I think the test hooks introduce a lot of complexity. I wonder if that necessary. Have you considered using sync.WaitGroup
? Something like:
// set searchTimeOut to low value so the test is quicker
originalSearchTimeout := searchTimeout
ctx, cancel := context.WithCancel(context.Background())
var wg sync.WaitGroup
defer func() {
cancel()
wg.Wait()
searchTimeout = originalSearchTimeout
}()
searchTimeout = 250 * time.Millisecond
wg.Add(1)
go func() {
fetcher.run(ctx, peersToSkip)
wg.Done()
}()
That is still a lot of code, but I guess familiarity helps to understand, i.e., wait groups and context should be familiar to people working with Go. Also, the above would not require production code change.
That being said, there would 1 case remain where it's impossible to use the above concept. Namely TestFetcherFactory
. I think FetcherFactory.New
must be changed as it starts a goroutine but does not provide any way for clear termination (possibility to wait for shutdown).
@frncmx Thanks for the suggestion, but you already answered why I did not use wait group approach. There has to be a change in FetcherFactory.New which would accept a synchronization argument (wait group or a channel) which would actually only be used in test, not in production. I do not think that test hook is complex, it is even used in standard library, and there is also a test for it in this pr. My first attempt was to actually pass a channel fetcher.run, but wait group is also a nice trick. Would you prefer to change FetcherFactory.New to accept the wait group and remove the test hook? |
I looked into what effort would it take to change the interface of Clearly:
That being said what would you think about adding a quit channel to In code;
|
|
I still don't like the hook that much even though I seen something similar in standard library once and I admit you tested the hook. I don't like the hook because the usage of the hook doesn't feel clean and simple to me in the tests. I actually liked your first approach better. Changing a private function signature for a test (even though I know that should not be the default approach). But then I realized that did not fix all the data races. So, please bare with me. :) |
@frncmx thanks for the explanations, they are very helpful. Yes, quit channel is cleaner approach, but we will have that field in fetcher that is actually not used in production, only in tests. If it is ok to have that overhead in production, I will change the implementation, but we would still need to change FetcherFactory.New function to be able to wait in tests until internal fetcher quit channel is closed. That can be done in a few ways, by passing the channel as the argument, having a callback, or extending NetFetcher interface. What would you prefer? Or maybe you would like to take over this PR, given the amount of work you invested in it? |
Rejected in favour to ethereum/go-ethereum#18469. |
This PR addresses #1109 issue and fixes another set of data races.
This PR introduces no changes to function signatures, but adds one if statement to Fetcher.run with insignificant performance degradation on production code.
Fetcher.run function is controlled by the provided Context, but there is no synchronization between the termination of Fetcher.run goroutine and test termination. Some of the fetcher tests change the global variable searchTimeout and Fetcher.run reads it. If Fetcher.run goroutine is not terminated before serachTimeout is set back to the original value (with defer), or another tests starts, there is a race between reading and writing serachTimeout.
The first race is described in #1109, and the second type of race is
go test -race -count 1 -v github.com/ethereum/go-ethereum/swarm/network
where two goroutines from two different tests are racing on the same global value.
Synchronization is achieved by providing a test hook function that waits for a channel to be closed when Fetcher.run goroutine is done. The right order of defer functions in Fetcher tests is important to avoid races on changing searchTimeout and testHookFetcherRun. Function testHookFetcherRun is added to avoid changes to FetcherFactory.New function signature which calls Fetcher.runs in a goroutine in some of the tests.
With Fetcher.run goroutine synchronization, a deadlock in mockRequester.doRequest function was detected when context is canceled and nothing is receiving on requestC channel.
All races and problems handled in this PR are not affecting production code, just tests.
All Fetcher tests should finish without races:
go test -race -count 1 -v -run '^(TestFetcherSingleRequest|TestFetcherCancelStopsFetcher|TestFetcherCancelStopsRequest|TestFetcherOfferUsesSource|TestFetcherOfferAfterRequestUsesSourceFromContext|TestFetcherRetryOnTimeout|TestFetcherFactory|TestFetcherRequestQuitRetriesRequest|TestRequestSkipPeer|TestRequestSkipPeerExpired|TestRequestSkipPeerPermanent|TestFetcherMaxHopCount|TestSetTestHookFetcherRun)$' github.com/ethereum/go-ethereum/swarm/network