-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flaky unit test TestServerSinglePort #5519
Comments
Can i work on this issue ? |
@babugeet Yes, go ahead. There is no need for issue assignment. Feel free to make a PR |
@yurishkuro I tried to replicate it, but I got no such error |
I know, this is why it's hard to fix. But the issue exists. It's not even clear from the error which specific server is causing it. A couple of things we could do to make it easier to debug: wrap the errors with contextual messages, and perhaps add logging to the Close methods to see in which order the servers are being closed when the error happens. |
@yurishkuro I've been troubleshooting a persistent timeout issue with the TestServerSinglePort test . Here's what I've done so far:
Despite these changes, the test still times out after 5 minutes. It seems the issue might be within the server code itself. Any guidance would be greatly appreciated. |
The error does not seem to reproduce consistently, which makes it difficult to diagnose the exact cause. |
@vermaaatul07 why timeout? The error log in the ticket is an attempt to close an already closed connection. |
@vermaaatul07 the changes you described seem like useful addition even if they don't resolve the issue. Why don't you open a PR for those? Maybe with increased logging we will be able to pinpoint the issue if the test fails again. |
@yurishkuro I think the timeout happens because the test waits indefinitely for the server to start or stop, indicating an issue with managing the server lifecycle. The error about attempting to close an already closed connection points to a synchronization problem. I have added logging to the Close methods and wrapped errors for better context, but the issue persists. Edir : now its been resolved |
Thanks for the feedback, i will open a PR for those. |
@yurishkuro I ran the test 100 times, and it passed every single time. But since the issue was flaky, I'm wondering how we can be sure it's really fixed. Any tips on what else I can do to confirm the fix? Could you take a look and let me know if there's anything else I should check? |
This test failed several times recently:
Looks like we have some kind of race condition in closing the connection more than once. When we use cmux, the listener is not owned by the http/grpc servers, but they probably each try to close it.
The text was updated successfully, but these errors were encountered: