Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: fix flaky test ClientSendsAGoAway #7224

Merged
merged 3 commits into from
May 22, 2024
Merged

Conversation

aranjans
Copy link
Contributor

@aranjans aranjans commented May 10, 2024

Fixes #7160
fix: goaway_test/TestClientSendsAGoAway didn't use to wait for channels to be ready and which is why sometimes it was not able to read the frame and eventually result in test failure. This PR change make sure we read the frame only when the channel's state becomes READY.

RELEASE NOTES: n/a

@aranjans aranjans added this to the 1.64 Release milestone May 10, 2024
@aranjans aranjans force-pushed the aranjans_7160 branch 2 times, most recently from 3753ade to 63ba062 Compare May 10, 2024 10:53
@purnesh42H
Copy link
Contributor

Probably need some more description about the fix

@@ -816,6 +818,7 @@ func (s) TestClientSendsAGoAway(t *testing.T) {
}
}
}()
cc.WaitForStateChange(ctx, connectivity.Connecting)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would be the max wait here? Timeout for all tests is 7m. Is there a chance it can breach that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Max timeout is as long context is not timed out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't the state to wait be connectivity.Ready?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it depends where the race is happening.

If there's just a race where we call Close before we even start creating the connection, then this is a problem with the test, and Connecting should fix that.

If this is a race where a connection is established and doesn't get a GOAWAY written to it (due to internal state in the ClientConn/addrConn, then Connecting will expose that race and we should fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@purnesh42H WaitForStateChange waits until the connectivity.State of ClientConn changes from sourceState or ctx expires. A true value is returned in former case and false in latter. So we are providing sourceState as Connecting and hence it will wait until state changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think race is detected when client conn is getting closed before we try to send goAway so believe just waiting for clientConn state change will fix this race condition.

@@ -800,6 +800,7 @@ func (s) TestClientSendsAGoAway(t *testing.T) {
for {
f, err := ct.fr.ReadFrame()
if err != nil {
t.Logf("error reading frame: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to push to errCh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

test/goaway_test.go Show resolved Hide resolved
@@ -816,6 +818,7 @@ func (s) TestClientSendsAGoAway(t *testing.T) {
}
}
}()
cc.WaitForStateChange(ctx, connectivity.Connecting)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it depends where the race is happening.

If there's just a race where we call Close before we even start creating the connection, then this is a problem with the test, and Connecting should fix that.

If this is a race where a connection is established and doesn't get a GOAWAY written to it (due to internal state in the ClientConn/addrConn, then Connecting will expose that race and we should fix it.

@aranjans aranjans modified the milestones: 1.64 Release, 1.65 Release May 12, 2024
@aranjans aranjans assigned aranjans and dfawley and unassigned aranjans May 13, 2024
@dfawley dfawley assigned aranjans and unassigned dfawley May 13, 2024
@dfawley
Copy link
Member

dfawley commented May 13, 2024

From chat offline:

Actually, this structure of the test actually proves there is a bug here that I don't think you're fixing
If a connection is created, we want it to reliably get a goaway. We know a connection exists otherwise we won't get to this part of the code, because the channel passes it along

So I think the test is correct (though it can still be simplified), but there is a race in the code that prevents the GOAWAY from being sent in some cases.

@aranjans aranjans assigned dfawley and unassigned aranjans May 14, 2024
@aranjans
Copy link
Contributor Author

aranjans commented May 14, 2024

@dfawley updated the PR.

@aranjans
Copy link
Contributor Author

Update: this is the code where we are hardcode closing the transport. Will update the PR with the change where we send goAway frame in this case as well.

@aranjans aranjans assigned aranjans and unassigned dfawley May 14, 2024
@aranjans
Copy link
Contributor Author

@dfawley As per our discussion offline, we won't be waiting for all the transport conn to close down. So this PR's change is ready to review as it just makes sure we are waiting for the clientconn to be READY.

@aranjans aranjans assigned dfawley and unassigned aranjans May 16, 2024
test/goaway_test.go Outdated Show resolved Hide resolved
test/goaway_test.go Outdated Show resolved Hide resolved
test/goaway_test.go Outdated Show resolved Hide resolved
@dfawley dfawley assigned aranjans and unassigned dfawley May 16, 2024
@aranjans aranjans assigned dfawley and unassigned aranjans May 17, 2024
test/goaway_test.go Outdated Show resolved Hide resolved
@dfawley dfawley assigned aranjans and unassigned dfawley May 21, 2024
@aranjans aranjans assigned dfawley and unassigned aranjans May 22, 2024
@dfawley dfawley changed the title fix: Flaky test ClientSendsAGoAway test: fix flaky test ClientSendsAGoAway May 22, 2024
@dfawley dfawley merged commit a639c40 into grpc:master May 22, 2024
12 checks passed
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 19, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky test: TestClientSendsAGoAway
3 participants