-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net/http: TestServerKeepAlivesEnabled_h2 test flake #18701
Comments
I see nothing suspicious or racy in the test. I'm afraid I won't be able to help on this one, just like I was unable to help last time in #14061. /cc @alexbrainman |
I run
in a loop against https://go-review.googlesource.com/#/c/35352/2 on my both windows xp and windows 7. It didn't fail even once. Alex |
Another failure from dashboard https://build.golang.org/log/cf6c2853214fb8d7a7667e0872d3bc3b8520b56d Alex |
Another from trybot: https://storage.googleapis.com/go-build-log/bebb0f25/windows-amd64-gce_fd553238.log |
Err, I just noticed this was supposed to be a windows issue, and that the error is not exactly the same. Should I open a separate issue? |
@mvdan yep, that looks like a separate issue |
One more: windows, but the error message is closer to this one than #21724. |
I applied this diff:
to 538b4ba and I can see this:
Note how there are two similar connections here. And during first connection client receives 799, 51, 62 and 239 bytes. While during second connection it fails to receive last 239 bytes. I suspect what is happening here, the server sends these 239 bytes and closes connection immediately. That forces connection closed before client can read the data. If server wants to make sure that client gets the data, it must use CloseWrite followed by multiple Read until connection is closed. Alex |
You are right. I ran the test on a Windows box with GODEBUG=http2debug=2. After receiving a request, the server replies with a HEADERS, DATA, and GOAWAY, then closes the connection. The client's ReadFrame fails immediately, before receive any of those frames. I understand why this flakes on Windows: The flake depends on whether or not the server reads the client's WINDOW_UPDATE frame before or after the server sends GOAWAY. If the WINDOW_UPDATE is not read when the connection is closed, then the kernel closes the connection with RST (instead of FIN) due to the non-empty receive buffer. I reduced this to the following repro: I don't understand why TestServerKeepAlivesEnabled_h2 flakes on Windows but not Linux. Given that I have not seen a Linux flake, I expected the above program to fail on Windows but not Linux, but it fails deterministically on both Windows and Linux. Interestingly, the above program does not fail on play.golang.org. Perhaps docker is doing something weird. In any case, the fix is simple. Currently, we close the connection immediately if we've responded to all streams that were open before we sent GOAWAY. Instead, we should keep the connection open for a short period of time so we can drain any packets from the kernel receive buffer. Ideally we'd delay about 1 RTT, under the assumption that the client won't send more frames after seeing GOAWAY, but getting a precise RTT measurement is annoying. I think it's sufficient to wait about 1 second (which we are already doing in a similar case). |
I really wouldn't know. But you said "If the WINDOW_UPDATE is not read when the connection is closed, then the kernel closes the connection with RST", maybe WINDOW_UPDATE is always read before connection is closed on Linux (for one reason or another). Is it easy enough to make server close connection before reading WINDOW_UPDATE on Linux (just to see what happens)?
I don't think play.golang.org is running docker. Is it?
If you say so. Happy to leave it for you. Alex |
Change https://golang.org/cl/71372 mentions this issue: |
Currently, we close the connection immediately after sending a GOAWAY frame if all outstanding responses have been completely sent. However, the client may have had requests in-flight at that time which have been queued in the kernel receive buffer. On both Windows and Linux, if the connection is close()'d when the receive buffer is not empty, the kernel sends RST. This has the effect of aborting both sides of the connection, meaning the client may not actually receive all responses that were sent before the GOAWAY. Instead, we should delay calling close() until after the receive buffer has been drained. We don't want to delay indefinitely, which means we need some kind of timeout. Ideally that timeout should be about 1 RTT + epsilon, under the assumption that the client will not send any more frames after receiving the GOAWAY. However, 1 RTT is difficult to measure. It turns out we were already using a 1 second delay in other cases, so we reuse that same delay here. Note that we do not call CloseWrite() to half-close the underlying TLS connection. This seems unnecessary -- GOAWAY is effectively a half-close at the HTTP/2 level. Updates golang/go#18701 (fixes after it's bundled into net/http) Change-Id: I4d68bada6369ba95e5db02afe6dfad0a393c0334 Reviewed-on: https://go-review.googlesource.com/71372 Run-TryBot: Tom Bergan <tombergan@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Change https://golang.org/cl/75210 mentions this issue: |
Change https://golang.org/cl/88319 mentions this issue: |
Currently, we close the connection immediately after sending a GOAWAY frame if all outstanding responses have been completely sent. However, the client may have had requests in-flight at that time which have been queued in the kernel receive buffer. On both Windows and Linux, if the connection is close()'d when the receive buffer is not empty, the kernel sends RST. This has the effect of aborting both sides of the connection, meaning the client may not actually receive all responses that were sent before the GOAWAY. Instead, we should delay calling close() until after the receive buffer has been drained. We don't want to delay indefinitely, which means we need some kind of timeout. Ideally that timeout should be about 1 RTT + epsilon, under the assumption that the client will not send any more frames after receiving the GOAWAY. However, 1 RTT is difficult to measure. It turns out we were already using a 1 second delay in other cases, so we reuse that same delay here. Note that we do not call CloseWrite() to half-close the underlying TLS connection. This seems unnecessary -- GOAWAY is effectively a half-close at the HTTP/2 level. Updates golang/go#18701 (fixes after it's bundled into net/http) Change-Id: I4d68bada6369ba95e5db02afe6dfad0a393c0334 Reviewed-on: https://go-review.googlesource.com/71372 Run-TryBot: Tom Bergan <tombergan@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Change https://golang.org/cl/147419 mentions this issue: |
Looks like the introduction of CCS records in the client second flight gave time to s_server to send NewSessionTicket messages in between the client application data and close_notify. There seems to be no way of turning NewSessionTicket messages off, neither by not sending a psk_key_exchange_modes extension, nor by command line flag. Interleaving the client write like that tickled an issue akin to #18701: on Windows, the client reaches Close() before the last record is drained from the send buffer, the kernel notices and resets the connection, cutting short the last flow. There is no good way of synchronizing this, so we sleep for a RTT before calling close, like in CL 75210. Sigh. Updates #9671 Change-Id: I44dc1cca17b373695b5a18c2741f218af2990bd1 Reviewed-on: https://go-review.googlesource.com/c/147419 Run-TryBot: Filippo Valsorda <filippo@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Adam Langley <agl@golang.org>
A recent trybot run showed this failure (CL is https://go-review.googlesource.com/c/35352/2):
https://storage.googleapis.com/go-build-log/ad057e0a/windows-386-gce_7ecf210b.log
/cc @bradfitz
The text was updated successfully, but these errors were encountered: